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1. Introduction and statement of the main theorem 

Let p = (p^ represent any probability vector in R 6 . This paper is concerned with a partial order € among the 720 
coordinatewise permutations of p, based on the Shannon entropy function H(x) — — xlogx, which is dependent only 
upon the ordering of the pi and not upon their values. It arose originally in the guise of a question in quantum 
information theory about classicality versus quantumness [9]; however the structure theory turns out to be quite 
general. Because its natural setting is joint quantum systems the definition requires that we stipulate 'subsystems' 
of dimensions 2 and 3 and then take the entropy of the marginal probability vectors from p with respect to these 
subsystems. This construction brings with it a natural equivalence class structure and so the partial order is in fact 
defined only upon 60 equivalence classes of these permutations, each of size 12. We summarise this as our main 
theorem, as follows. Recall that the density of a partial order on a finite set of size n is defined to be r / (™), where r 
is the number of relations which appear in the partial order, and ( n ) denotes the binomial symbol. 

Theorem 1. Let G — Sq be the symmetric group on six letters and let J be any one of the six parabolic subgroups 
of G which are isomorphic to the dihedral group of order 12. There is a partial order € on the right coset space J\G of 
density 0.47 whose analytical description may be given solely in terms of the Shannon entropy function H . Moreover 
it has a concise independent algebraic description in terms of group ring elements. 

The proof of this theorem, together with an in-depth analysis of the structure of £, are essentially what constitute 
the remainder of the paper. We must mention here that our description of <£ is unfortunately incomplete: while we 
believe that there are 830 relations which constitute £ there are nevertheless four of these relations, which we shall 
refer to throughout as C4, which we have been unable to prove or disprove analytically; although the numerical 
evidence for their validity is compelling. So our statements about the partial order must be read with the caveat 
that there is still a possibility that some or all of the C4 are not in fact valid relations. However the structure of the 
remaining 826 relations of the partial order is independent of these four. 

Such a partial order may in fact be described for any function / instead of H provided that certain convexity 
conditions are met: essentially we obtain a kind of 'pseudo-norm' based upon the function / that we choose. A 
curious consequence is that we may describe a whole suite of functions apparently unconnected to entropy, whose 
partial orders nevertheless appear numerically to mimic (£ exactly. At one level this is not very surprising, since the 
partial order is in some sense merely a discrete approximation to the curvature of the function concerned - hence 
there will be many different functions whose curvature is sufficiently similar on the appropriate region of space to 
give the same discrete approximation. But at another level this points to a deeper connection between certain of 
these functions and discrete entropies: perhaps there is an easier way to model entropy-related phenomena for low- 
dimensional joint systems than to attack the rather difficult entropy function itself. The space of relatively simple 
functions which would appear to mimic the entropy function - in this albeit limited context - is incredibly varied. 
For example, the function f(x) — cos(^cc) seems numerically to give exactly the same partial order as H(x), despite 
having a markedly different curvature function; the same is true of the function q(x) = (ax) 3 — (ax) 2 when a = |. 
Moreover any slight variation in the respective coefficients y or a will 'break' the respective partial order. However 
these functions are not concave on the full interval (0, 1) and so the techniques of this paper will not work on them. 

As we vary the underlying function /, another key question arises as to how the algebraic description needs to be 
modified in order to reflect the new analytical structure. Both the analytic and algebraic approaches are rich topics 
for further study. 

The constructions here are not specific to the 6-dimcnsional case; however dimension 6 gives the first non-trivial 
partial order and sadly also the last easily-tractable one. Even for 2x4 (which is the next interesting case), numerical 
studies indicate that the number of separate relations which need to be considered is of the order of 10 5 , the 3x3 
case yields around 3 million, and for 2 x 5 it is of the order of 20 million. Also only where the dimensions are 2x2 
and 2x3 are we able to single out a definite permutation which is guaranteed to give the maximal classical mutual 
information (CMI) no matter what the probability vector chosen [S]: in all other dimensions this grows into a larger 
and larger set of possibilities. However the constructions of this paper may be extended to any situation where we 
have joint systems of dimensions m and n: for any sufficiently well-behaved function / we obtain a binary relation 
between certain permutations of the probabilities of the joint system, yielding what may be viewed as a partial order 
upon (some quotient of) the symmetric group S mn itself. We shall always assume 2 < m < n, for if m > n then 
the situation is identical just with the subsystems reversed; if m = 1 then there is nothing to be said since every 
permutation will give the same result, as will be seen from the definitions below. 

We conclude this introductory section with a word on how this partial order arose. Suppose that we have ordered 
the pi so that p\ > pi > p% > p± > p§ > p§. In [9] it was shown that the permutation 



(Pl,P4,P5,P6,P3,P2) 
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will always yield the maximal CMI out of all of the possible permutations given by Sg. This built on work in [5] 
and [7] which showed that the minimum CMI of all of these permutations was contained in a set of five possibilities, all 
of which do in fact occur in different examples. The results on the minima were achieved solely using considerations 
of majorisation among marginal probability vectors; however in order to prove maximality it was necessary to invoke 
a more refined entropic binary relation denoted >. In exploring this finer ordering we found that it did indeed give 
rise to a well-defined partial order which moreover had a neat description in terms of symmetric group elements. So 
the paper is the result of this exploration. 

2. Structure of the proof of the main theorem 

We now outline how the proof of theorem [T] will proceed. First of all however we need to decipher the connection 
with the parabolic subgroups J, since this barely appears elsewhere in the paper. The point is that because Sg has 
a class of non-trivial outer automorphisms we are able to study some phenomena via their image under any 
particular outer automorphism of our choosing: a trick which often makes things much clearer. Let K be the dihedral 
group corresponding to row and column swaps which we shall define in section [TT] As is easy to verify, for any J as 
described in the theorem there exists at least one outer automorphism mapping K onto J and so any partial order 
which we may define upon K\G will also give an isomorphic partial order on J\G, and vice-versa. So we define our 
partial order in its natural context on the coset space K\G and then merely translate the result into the more familiar 
language of parabolic subgroups in the statement of the theorem. Indeed there is no reason - other than the richness 
of structure which has been investigated for parabolic subgroups - for phrasing it in these terms. One could equally 
well describe the partial order on the quotient of G by any dihedral subgroup of order 12, for there are two conjugacy 
classes of subgroups of G which are dihedral of order 12 - namely the class containing K and the class containing the 
parabolics J - each of size 60, and they are mapped onto one another by the action of the outer automorphisms. 

So the proof of theorem [l] will go as follows. Once we establish the basic definitions regarding entropy, classical 
mutual information, majorisation and the entropic binary relation >, we begin to examine each of them in the context 
where two permutations differ by right multiplication by just a single transposition: first because this is the simplest 
case; but secondly because it actually generates all but 5 out of 186 covering relations in the partial order. A general 
rule for comparing pairs of permutations differing by more than one transposition under the entropic binary relation t>, 
moreover, seems to be very difficult: we are fortunate that only these five 'sporadic' relations exist which cannot be 
generated via some concatenation of single-transposition relations. We elaborate necessary and sufficient conditions 
for permutations separated by a single transposition both for majorisation and for the entropic binary relation >, 
noting the result from [9] that majorisation implies > but not vice-versa. This gives a total of 165 relations arising 
from majorisation, and 90 relations arising solely from the binary entropy relation E>: a grand total of 255 relations 
arising from single transpositions. The transitive closure of these 255 relations contains 818 relations in total. 

Once this is proven we shall almost have completed our description of €, for numerically it is easy to show that with 
the exception of the 12 relations which are generated when the sporadic 5 are included, any other possible pairings 
are precluded by counterexample. So the partial order must have between 818 and 830 relations. With the two proven 
in theorem [T7| the transitive closure grows to 826 relations, leaving just the set C4 mentioned above. This completes 
the 'analytic' description of (£. 

It then remains to prove that 2; has a neat description in terms of the group ring Z[G]. We give an iterative 
algorithm for constructing the entire web of 255 single-transposition relations referred to above starting from scratch, 
using simple rules which have no apparent connection to entropy. Of course we would not have 'seen' this description 
had it not been for the analytic work which went before; however once we know what we are looking for, the entire 
complex of 255 relations is describable in very straightforward terms. The sporadics however must be added in to 
both descriptions: there seems to be no easy way of unifying their structures with the bigger picture. 

3. Acknowledgments 

First of all thank you to Terry Rudolph and the QOLS group at Imperial College for their hospitality. I would also 
like to thank Peter Cameron, Ian Grojnowski and David Jennings for many helpful conversations. 
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I. CMI, majorisation and the entropic binary relation > 
A. The classical mutual information attached to an my, n probability matrix 

Let N be any positive integer and define the usual probability simplex to be 

jV+l 

A w = {(pi.pa, • ■ • ,Pn+i) G R N+1 : Pi = 1 and Pi ^ for a11 z >- 

Now consider the case where N + 1 = mn is a composite number and let p = (pi) G A m "~ 1 be any probability vector: 
we view this as a set of joint probabilities for two systems of size m and n. We reflect the split into subsystems by 
arranging the pk into an m x n-matrix P as follows: 



Cl 

n / pi 



r-2 



Pn+l 



C2 
Pn+2 



r m \P(m-l)n+l P(m-l)ri+2 



Pn \ 

P2r, 

Pmn J 



= p. 



(i) 



As depicted above we let the row sums (which are the marginal probabilities for the first subsystem) be denoted by 
Ti = X^=i P(i-i)n+j f° r * = If - >Tn and similarly for the column sums (which are the marginal probabilities for the 
second subsystem): Cj = Y^i=iP(i-i)n+j for j = 1, . . . ,n. Then given any permutation a in the symmetric group S„ m 
on mn letters sending pi to p a (i) we define a new m x n-matrix P a as follows: 



po 



( Pa(l) Pa(2) ■ ■ ■ Pa(n) \ 

Pa{n+1) Pa(n+2) ■ ■ ■ Pcr(2n 

\Pcr((m-l)n+l) Pa{(m-l)n+2) ■■■ Pa(mn) J 



defining the appropriate marginal probabilities in a similar fashion. 

To define the classical mutual information [4, §2.3] we take the sum of the entropies of the Ti and the Cj over all i,j 
and then subtract the sum of the individual entropies of the pk, for k = 1, . . . , mn. 

Definition 1. With notation as above, the classical mutual information I(P) of the matrix P is given by 

m n mn 

I(P) = ^ ~ r i l0 § r i + ^2 ~ C 3 l0 S C 3 - ^2 -Pk fogPfc- (2) 



k=l 



We will often write H(x) = — xlogx for x g [0, 1] and so we may rewrite as 

up) = J2 H{n) + J2 H ( c j) - E H (p»)- 

i=l j = l k=l 



B. Majorisation between two elements of S mn 

For definitions and basic results connected with majorisation, see [T] and |S]. We shall use the standard symbol >- 
to denote majorisation between vectors. For any m x n probability matrix M, let us denote by r(M) £ M™ 1 the 
vector of marginal probabilities represented by the sums of the rows of M and similarly by c(M) G M. n the vector of 
marginal probabilities created from the sums of the columns of M. Throughout the paper we shall use the symbol H 
interchangeably for the function of one variable as well as the function on probability vectors, where if v = (i>j) G M. N 
is any such vector then 



N N 
ff(v) = £]#(<) = -£>; fog 

i=l i=l 
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Lemma 2. Let M\,M 2 be two probability matrices. Ifr(Mi) y r(M 2 ) and if c(Mx) y c(M 2 ), then 

/(Mi) < I(M 2 ). 

Proof. See [6]: it follows from the fact that H is a Schur-concave function [TJ §11.3]. □ 

It should be pointed out that the converse is definitely NOT true: indeed it is this very failure which gives substance 
to the definition of the entropic binary relation [>. 

Definition 2. If the hypotheses of Lemma^hold then we write 

Mi y M 2 

and we shall say that Mi majorises M 2 : but this matrix terminology is not standard. 

Note that definition [2] has nothing intrinsically to do with entropy: it is the fact that entropy is a Schur-concave 
function which enables us to link it to majorisation [8]. By the symmetry of the entropy function H upon vectors, 
the relation of majorisation between matrices which we have just defined is invariant under row swaps and column 
swaps; moreover if to = n then it is also invariant under transposition. 

From now on we shall use p to denote a probability vector of length mn (where to, n will be clear from the context) 
written in non-increasing order and P to denote the corresponding to x n matrix derived from p as above by successively 
writing its entries along the rows. Similarly p a , P a will denote the respective images under an element a G S„ m . 
Notice that our p are thereby chosen from a much smaller convex set than A" 1 " -1 , namely from the analogue of the 
'positive orthant' of a vector space: 

£> mn = { p e A™" 1 : Pl > p 2 > . . . > Pmn }, (3) 

which is the topological closure of a fundamental domain for the action of S mn upon A" 1 "^ 1 . Henceforth all of the 
probability vectors with which we shall work will be assumed to be chosen from this set S) mn ; the corresponding set 
of matrices (constructed from each p g S mn as above and therefore also with entries in non- increasing order as we 
go along successive rows) will be denoted 9Jt mn . 

Definition 3. Let a, a' e S mn . If v{P a ) y r(P"') and if c(P a ) y c(P CT ') for all P € M mn then we write 

a y a' 

and we shall say that a majorises o~': but again this terminology is not standard. 

We are now ready to define a finer relation than the one which majorisation gives upon permutations of a fixed 
probability vector. This relation is the key to all of the results in this paper. 



C. Definition of the entropic binary relation E> between two elements of S mn 

If we consider the class of (mn)l matrices formed by permuting the entries in the matrix P in ([I]) under the full 
symmetric group S m „ and then look at the CMI of each of the resulting matrices, there is a rigid a priori partial 
order which holds between them, and which does not vary as P moves over the whole of 9ft mn . That is to say, it does 
not depend on the sizes of the {pi} but only upon their ordering. In low dimensions, much of the partial 
order can be explained by majorisation considerations. However there is a substantial set of relations which depends 
on a much finer graining than majorisation gives. In dimension 6 this fine-graining will become our entropic partial 
order (£. 

We denote the individual relational operator by > and define it as follows. 
Definition 4. Given permutations a, a' € S mn we say that 

a \> a' 

if it can be shown that /(P CT ) — I(P a ) is non-negative for all P £ 9Jt mn . That is to say, given an ordered matrix 
P 6 9Jtmnj the relation I(P a ) < I(P a ) holds irrespective of the relative sizes of the entries. This is the same 
as saying that 

H(v{P a ))+H{c(P a )) < H{r{P a ')) + H{c(P a ')) 

for all P G DJl mn . 

In order to keep the notation consistent with that of majorisation, we have adopted the convention that a > er' 
corresponds to I(P a ) < I(P a ') for all P € M mn . 
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Remark. A key observation at this stage is that the partial order is not really connected with the notion of classical 
mutual information ( CMI) so much as it is with entropy itself, for the term which is the sum of the entropies of the 
individual joint probabilities is common to all permutations of a given fixed matrix P, and so as we pointed out in 
definition [J] the ordering depends only upon the relative sizes of the sums of the entropies of the marginal probability 
vectors. Indeed, nothing meaningful may be said within this framework about any relation between the CMI of matrices 
whose (sets of) entries are distinct: the ordering is effectively concerned solely with permutations. 

Now majorisation implies >, but not vice- versa: we have the following result which was proven in The notation 
(a, j3) for the transposition will be clarified in the next section. 

Proposition 3. Let a € S mn and let r = (a, j3) € S mn be the transposition swapping elements a and /?. Then 

(<7 >- err) =>■ (er > err) . 

Furthermore if a and j3 belong to the same row or column of the corresponding m x n matrix, then the two notions 
are the same. □ 

We now explore the relations which arise from single transpositions. 



D. The entropic binary relation E> for a single transposition 

In order to see what the entropic binary relation is in the case which will most interest us - that of a single 
transposition - we once again consider a general m x n probability matrix P = (pi) € %Sl mn as depicted in[l] Let a 
be some element of G, so our starting matrix will be P° ' . Let r be any transposition acting on P a ' , interchanging two 
elements which we shall refer to as a and j3 (by a slight abuse of notation, since the positions and the values will be 
referred to by the same symbols). The following diagram illustrates this action of r on P° ': we write P Ta = (P a ) T 
for the image of P a under r since we always write abstract group actions on the left; but note that when it comes to 
the comparison we are trying to effect between group elements then since r actually multiplies er on the right, we will 
be comparing a with err as required. 



( P<r(l) 
Pa(n+1) 



Pa(n+2) 



\Pa((m~l)n+l) Pa{[m-l)n+2) 

and under the action of r this is mapped to: 



c /3 



Pa{n) \ 
Pa(2n) 



Pa(mn) J 



(4) 



Pa{n+1) 



Pa{2) 
Pa(n+2) 



P<r((m-l)n+2) 



Pa(n) \ 
Pa(2n) 



Poijnri) J 



pr 



(5) 



Without loss of generality we may stipulate that as matrix entries a > /3 (if they are equal there is nothing 
to be done). We wish to compare I(P a ) with I(P T<T ). Note firstly that by the definition of CMI, the difference 
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I(P Ta ) — /(-P CT ) depends only on the rows and columns containing a,/3: all of the rest of the terms vanish as they 
are not affected by the action of r. We denote by r a (respectively rg) the sum of the entries in the row of P® which 
contains a (resp. /?), and by c a (respectively, cp) the sum of the entries in the column of P a which contains a (resp 
f3). Similarly, we denote by r T a , rjj, c T a , the image of these quantities under the action of r. See the diagrams Q, ([5* 
above. 

NB: r T a ,c T a (respectively, rj,cj) no longer contain a (respectively /?), but rather (3 (respectively a). 
So the quantity we are interested in becomes 



I(P Ta ) - I{P a ) = H{r{P Ta )) + H(c{P Ta )) - H{r{P a )) - H{c(P a )) 

= H{r T a ) - H(r a ) + H(r}) - H(r fj ) + H(c T a ) - H(c a ) + H(c}) 



H(cf>), 



(6) 



with the proviso that if a and /3 happen to be in the same row (respectively column) then the (respectively, c*) 
terms vanish. The terms in ^ are grouped in pairs of the form ±(H(x + (a — /?)) — H(x)), which means we may 
write it in a more suggestive form: 



I(P Ta )-I(P a ) = (a-j8) 



H{r a )-H{rl) , H (T})-H(r p ) H(c a ) - H{c T a ) , H(c}) - H(c p ) 



a- /3 



a — j3 a — (3 

To take advantage of the link with calculus, we introduce Lagrangian means [SJ VI §2.2]. 



■0 



(7) 



Definition 5. Let ip be a continuously differentiable and strictly convex or strictly concave function defined on a real 
interval J, with first derivative ip' . Define the Lagrangian mean ji v associated with ip to be: 



H v {x,y) 



-1 / ip(y)-<p(x) 
y-x 



ify^x 
ify = x 



(8) 



for any x, y € J, where (p' 1 denotes the unique inverse of (p' 



In other words, is the function which arises from the Lagrangian mean value theorem in the process of going 
from the points (a;, <p{x)) and (y, f{y)) subtending a secant on the curve of ip, to the unique point in [x, y] where the 
slope of the tangent to the curve <p is equal to that of the secant. See figure [T] Note that the hypothesis about strict 
convexity/concavity is necessary in order to ensure the uniqueness of the inverse of the derivative. 



(a, tp(a)) 




Y = V (X) 



FIG. 1. Definition of /x ¥ 



If we focus on the case where tp — H which is continuously differentiable and strictly concave on J = [0, 1] we may 
rewrite ^ as: 

1(1"*) - /(P CT ) = (a-(3) (-H'( m {r T a ,r a )) + H'{p H {r fj , r})) - H'{p H (<£, c a )) + H'( m (cp,c}))) ; 
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indeed (p'(x) = H'(x) = — (1 + log(x)) and so this becomes: 



J(P™)-/(P CT ) = (a-/?)log 



HH(rp,r T p )fi H (cp,c T p ) 



Since (a — /3) > 0, in order to determine which of the matrices gives higher CMI we only need consider the relative 
sizes of the numerator and denominator of the argument of the logarithm. So it is enough to study the quantity 

^H{r T a ,r a )^ H {c T a ,c a ) - /j, H (r^,r^)fj, H (cfi,c T ), (9) 

as we did in [9] and as we do for the entropic binary relation below. 

We are now in a position to re-state what is meant by > for this special case of a transposition. 

Lemma 4. With notation as above, a > or if and only if it can be shown that the quantity in is non-negative for 
allP£Wl mn . □ 

For convenience later on we state the following sufficient condition for t> which is proven in |9J. Consider the four 
terms which constitute the first arguments of the function fijj in (|9| , namely 

rl,cl,rf},Cf}. (10) 

Observe that there are no a priori relationships between the sizes of these quantities. Let us consider the possible 
orderings of the four terms based upon what we know of the ordering of the matrix elements of P. In principle there 
are 24 such possibilities; however in certain instances of small dimension such as our 2x3 case, most of these may be 
eliminated and we are left with only a few orderings. 



Proposition 5. Suppose that the minimum element in {10) is either rp or cp. In addition suppose that we can verify 



that rp + cp < r r a + c T a holds for any P 6 DJl mn . Then a t> err . 

Conversely, suppose that the minimum element in (i£fl ) is either r r a or c r a and in addition suppose that we can verify 
that rp + cp > r T a + c T a holds for any P £ 9Jl mn . Then err > a . □ 

E. Properties of the identric mean /j,h 

We now prove some facts specifically about [in which will give us an insight into the sign of the quantity in ([9]). 

Lemma 6. Fix t e (0, 1). For x G (0, 1 - t): 

(i) Hh{x, x + 1) > and is strictly monotonically increasing in x; 

(ii) hh(x, x + 1) is strictly concave in x; 
(Hi) \ < \{hh{x,x + t) -x) < \; 

(iv) (jj,h( x ) x + t)) * s strictly monotonically increasing in t for fixed x. 
LetS e (0,1 -t- x). 

(v) ^H{x+s,x+5+t) ^ s mono j :on j iC decreasing in t for fixed x; 



(vi) ^"j^x'x+t)'^ * s monotonic decreasing in x for fixed 

Now let < p < q < r < s, with t as above. 

(vii) Suppose that ggg^gfcrtg > L Then qr > ps . 

Let y > x > 0: then we note that (iii) says that the Lagrangian mean of x and y occurs between x + and 
x + Both extremes occur in the limit, so a priori we cannot narrow the range down further than this. 

Proof. First, solving ^ explicitly for ip = H we see that [in is in fact what is known as the identric mean of x and y: 

( ' yV\ 

Hn(x,y) =e 1 



t. 
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or if we set t = y — x: 

_! ({x + tY x+t ^ 



fin(x, x + t) = e 



e- 1 (x + t){l + -)f 
x 



Now parts (i) and (ii) are proven as lemma 6 of [9] and since (iii) follows by similar techniques we omit the proof. 

d 2 

dtdx 

t 2 . t. 



Part (iv) follows by taking the derivative g^p (/iff (x, x + t)) and observing that its sign is the same as the sign of 



log"(l 



x(x + 1) x ' 

which is shown to be positive in the course of proving the above lemma in [5] . 

Part (vi) follows directly by taking the partial derivative of M ""^~(f'^+t) +t ' > with respect to x; taking the partial 
derivative with respect to t instead we see that part (v) boils down to the inequality 

y x' K x + S' 
which on taking derivatives is seen to be a standard fact 

log 1 + -)>—— 
x x + t 

about logarithms [S]. 

To prove (vii): let w < x < y < z be any 4 positive real numbers arranged in the order shown, and let A > 0. 
Define a positive real function 

_ (w + A)^+ A )(z + A)( z+A ) 

X X{W, X, y, Z)- ^ + x y x+x){y + x y y+x) ■ 

Then it follows from the explicit form for fin above that 



fi H {q,q + t)fi H {r,r + t) 
VH(P,P + t)(J-H(s,s + t) 



Xo(P,Q, r , g) 
Xt{P,Q, r, s) ' 



From now on we shall simply write \\ f° r Xa(p, 9, f, s), for any A > 0, with 0<p<q<r<s understood as in the 
statement of the lemma. Since t > by assumption and since the term in square brackets is always positive it follows 
that 

H H (p,P + t)n H (s,s + t) xt 
So to prove (vii) it is enough to show that 

Xo 



Now 



> 1 =>■ qr > ps. 

Xt 

X A = ^(XA)=XAl0g (g + A)(r + A) (11) 

and so for A > (again noting that xa is always positive) it follows that the sign of \'\ is exactly the sign of 

[p + A)(s + A) - (q + A)(r + A) = ps - qr + ({p + s) - (q + r))\. (12) 
So suppose qr < ps. Lemma 4 of [9 shows that q + r > p + s =>• qr > ps, whence 

qr < ps => q + r < p + s, 



which shows that the sign in ( 12 ) must be positive for all A > 0. So \\ 1S an increasing function of A > 0, which 
means in particular that ^ < 1. 

Hence ^ > 1 must indeed imply that qr > ps, as claimed. □ 

Remark. The significance of condition (vii) of lemma\6\ is that it may be used to derive a necessary condition for >, 
which in the 2 x 3-case in combination with proposition^^ yields necessary and sufficient conditions for the relation > 
between two permutations related by a single transposition. See theorem [75] below. 



11 



II. The analytical construction of an entropic partial order <£ for the 2x3 case 

A. The entropic relation E> does give rise to a partial order 

Let m, n £ N be arbitrary. So far we have constructed an abstract framework for the study of the binary relation > 
between elements of S mn based on the entropy function H . Moreover we have shown that it is a necessary condition 
for 'majorisation' between matrices related by a permutation, in the sense of definition [2j We now prove that it does 
indeed give rise to a partial order on the quotient of S mn by the subgroup K mn generated by the appropriate H- 
invariant (and so also CMI-invariant) matrix transformations. 

Proposition 7. The binary relation > gives a well-defined partial order on the coset space of the symmetric group S mn 
modulo its subgroup K = K mn of row- and column-swaps (together with the transpose operation ifm = n). 

Proof. From its definition we see immediately that E> is reflexive and transitive. It is also anti-symmetric: let Tr be 
any right transversal of K in G = S m „. We need to show that if there exists a pair cr, a' £ Tk for which both cr > cr' 
and a' \> a hold simultaneously (meaning of course that I(P a ) = I(P a ) for all P £ 9JT mn ) then in fact cr = cr'. 

We proceed by a kind of induction on the number of transpositions needed to express a~ 1 o l . Suppose that a single 
transposition r takes a to a 1 : 

a" V = r. 

Our hypothesis that I{P a ) = I(P a ) for all P £ V)l m n means that the quantity in ^ is always zero; hence in 
particular its derivative with respect to t — a — j3 will be zero. Recall the function x\ which we defined in order to 
study the effect of varying t inside the expression ^ : if we look at its first partial derivative with respect to A we 



find the expression in (111. Now by our hypothesis the value of \x is always 1 and so the expression (111 reduces 



to log fej^jfej^] with the p, q, r, s being some appropriate ordering of the four terms in ( 10 ). Our hypothesis implies 
this is identically zero, which clearly is nonsense as we vary A provided we do not always have equality between the 
sets {p, s} and {q,r}, which in the general case we do not. So for the case where a~ Y a' is assumed to be a single 
transposition we have produced a contradiction: so indeed a = a' '. 
Next suppose that 

CT~V = TlT 2 , 

a product of two distinct transpositions. Without loss of generality we may assume that t\ interchanges two positions 
which 'bracket' at most one of the positions interchanged by r 2 (in the sense that if the two positions swapped by T\ are 
occupied by the same value x then that must also be true of every other position which is in-between these positions in 
the ordering of the entries, and hence at most one of those positions swapped by t 2 will be forced to be occupied by the 
same number x, but not both). If this is not the case we swap T\ with t 2 and the argument will go through unchanged. 
So let P be such a matrix, where the two positions swapped by T\ are occupied by the same value 5 say, but where 
one or both of the positions swapped by r 2 (depending on whether there is an overlap of one of them with ti) are 
assigned one or two different values. The key thing is that the values for r 2 be different from one another and that at 
least one of them be different from 6. By construction the transposition t\ will have no effect on the CMI of P a , so by 
our hypotheses the transposition t 2 cannot change the value either. Since we have factored out by the if-symmetry 
of the matrices, by the strict Schur- concavity of the entropy function [SJ §3A] any two distinct column sum vectors 
(respectively, row sum vectors) which are not permutations of one another will yield different entropies, and therefore 
ceteris paribus different CMFs. Now if r 2 were to swap two elements of the same row, then clearly the column sum 
vector would change but the row vector would not, giving a different CMI; a similar argument goes for two elements 
of the same column. So r 2 must be a diagonal transposition (see definition [6]) , swapping elements which lie both in 
different rows and in different columns; moreover the difference between the entropy of the row vectors before and 
after the action by r 2 must be exactly equal to that between the column vectors, with the opposite sign. But then 
we are back to the convexity argument for the case of a single transposition above. 

The general case follows by the same argument, noting that we may have to reduce either to the first or the last 
transposition in the expression for <7-V depending on the 'bracketing' effect mentioned above. □ 



B. The existence of a unique maximum CMI configuration in the 2x3 case 



For the rest of the paper we specialise to the case where m ~ 2 and n — 3, and we shall often merely state many of 
the results from [5]. The sections on definitions are identical in many places to those in [5] but are reproduced here 
for convenience. 
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From now on we denote our six probabilities by {a, b, c, d, e, /} and assume that they satisfy a>b>c>d>e> 
f > and a + b + c + d + e + f = l. In the main we shall treat these as though they were strict inequalities in order 
to derive sharper results. However we shall occasionally require recourse to the possibility that one or more of the 
relations be an equality: see for example the proof of theorem |15| 

We state the main theorem from [5]: 

Theorem 8. The matrix 

has maximal CMI among all 720 possible 2x3 arrangements of {a, b, c, d, e, /}. 

This is the case irrespective of the relative sizes of a, b,c, d,e, f. □ 

Remark. It is worth pointing out that one may arrive at the conclusion of theorem [$] by a process of heuristic 
reasoning, as follows. Recall from definition [7] that the CMI consists of three components, of which the last one is 
identical for all matrices which are permutations of one another. So in order to understand maxima/minima we 
restrict our focus to the first two terms, namely the entropies of the marginal probability vectors. Now entropy is a 
measure of the 'randomness' of the marginal probabilities: the more uniform they are the higher will be the contribution 
to the CMI from these row and column sum vectors. Beginning with the columns since in general they will contribute 
more to the overall entropy, if we look at the a priori ordering a > b > c > d > e > / it is evident that the most 
uniform way of selecting pairs in general so as to be as close as possible to one another would be to begin at the outside 
and work our way in: namely the column sum vector should read (a + /, b + e, c + d). Similarly for the row sums: 
we need to add small terms to a, but the position of f is already taken in the same column as a, so that just leaves d 
and e in the top row, and c and b fill up the bottom row in the order dictated by the column sums. See also the final 
appendix of fdjj where in fact we can achieve a total ordering by the same method for the simpler case of 2x2 matrices. 



C. The canonical matrix class representatives R.2x3 and the identification with a quotient of So 

There are 6! = 720 possible permutations of the fixed probabilities {a, b, c, d, e, /}, giving a set of matrices in the 
usual way which we shall refer to throughout as 7^2x3- However since simple row and column swaps do not change the 
CMI, and since there are 12 = | S3 1 . | S 2 1 such swaps, we are reduced to only 60 = 720/12 different possible values for 
the CMI (provided that the probabilities {a, 6, c, d, e, /} are all distinct: clearly repeated values within the elements 
will give rise to fewer possible CMI values). We now classify these 60 classes of matrices according to rules which will 
make our subsequent proofs easier, defining a fixed set of matrices which will be referred to as R.2x3- 

Throughout we shall use the symbol K = K e for the subgroup of G = S 6 generated by the row- and column-swaps 
referred to just now. This is the same subgroup K as we shall use in section [TTT] In the usual cycle notation 



K = ( (1,2)(4,5) , (1,3)(4,6) , (1, 4)(2, 5)(3, 6) ) < G, (13) 

where we fix for the remainder of this paper the convention that cycles multiply from right to left; so for exam- 
ple (1,2)(2, 3) = (1,2,3) and not (1,3,2) as many authors write. It follows that given any permutation a € G the 
action of K on rows and columns is via left multiplication, meaning our 60 CMI-equivalence classes correspond to 
right cosets of K in G; whereas permutations to move us from one right iC-coset to another act via multiplication on 
the right. 

Since we may always make a the top left-hand entry of any of the matrices in A^2x3 by row and/or column swaps, 

I (X X V \ 

we set a basic form for our matrices as M = )> wnere ( as sets) {x, y, u, v,w} = {&, c, e?, e, /}. This leaves 



u v w 

us with only 5! = 120 possibilities which we further divide in half by requiring that x > y. So our final form for 
representative matrices will be: 

M = ( a X y Y with x > y. (14) 



This yields our promised 60 representatives R.2x3 in the form (14 1 for the 60 possible CMI values associated with 
the fixed set of probabilities {a, b, c, d, e, /}. We shall both implicitly and explicitly identify this set R.2x3 with a set 
of coset representatives for K\G: and given two matrix classes M, N the statement that M >- N or M > N will be 
taken to mean that the corresponding coset representatives satisfy such a relation. We now need to subdivide R-2x3 
as follows. Matrices whose rows and columns are arranged in descending order will be said to be in standard form. 
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It is straightforward to see that only five of the 60 matrices we have just constructed have this form, namely matrix 
classes 1, 7, 13, 25 and 31 from appendix |A| which are explicitly: 



a b c 
d e / 



a b d 

c e f 



a b e 
c d f 



a c d 
b e / 



and 



ace 
b d f 



(15) 



Notice that all of these are in the form 
row of any of these to be permuted we o 



14) with the additional condition that u > v > w. If we allow the bottom 
Dtain 5 = IS3I — 1 new matrices which are not in standard form. In all this 



gives a total of 30 matrices split into five groups of 6, indexed by each matrix in (15). 

Now consider matrices in R,2x3 which cannot be in standard form by virtue of having top row entries which are 
'too small' but nevertheless which still have the rows in descending order, viz: 



a b f 

c d e 



a c f 
b d e 



a d e 
b c f 



a d f 
bee 



and 



a e f 
bed 



(16) 



Once again, by permuting the bottom row of each we obtain five new matrices: again a total of 30 matrices split into 
five groups of 6, indexed by each matrix in (16). This completes our basic categorization of the subsets of matrices 
in R2x3- 

Here are a few results from [3] which help us to classify the relations between the R-2x3 classes. Call two ma- 

a x y 

U V w 

(a,x,y,u,v,w) is so ordered (ie the word "apqrst" would precede the word "axyuvw" in an English dictionary). 



trices M 



a p q 
r s t 



N 



lexicographically ordered if the pair of row vectors (a, p, q, r, s, t) and 



Lemma 9. We may order the matrices in R.2x3 lexicographically, and majorisation respects that ordering. 



□ 



That is to say, if M lies above N lexicographically then N cannot majorise M. Note that this is not the case for 
the relation t>. 

Remark. We have set out this ordering explicitly in appendix\^\ We shall sometimes refer to matrix classes in R2X3 
by these numbers: when we do so, they will appear in bold figures as per the appendix. Equally we may refer to them 
by a right co set from K\G, a representative of each of which is also tabulated in appendix \A\ 



Lemma 10. Fix any matrix M 



e R 2 



with the additional requirement that u > v > w. Permuting the 



a x y 
u v w 

elements of the bottom row under the action of the symmetric group S3 we have the following majorisation relations: 



a x y 
u v w 





a x y 
w v u 



(17) 



There are no a priori majorisation relations within the two vertical pairs, with the exception of the instance 46 >- 47 
in corollary □ 



Note that the rightmost matrix in (17) corresponds to multiplication by the permutation ru = (to 2 i,W23) of the 



matrix M = (my), that is: 



a x y 
w v u 



a x y 
u v w 



By proposition 



the fact that A majorises B implies 



that 1(A) < 1(B), so the minimal value for the CMI among the representative matrices in R2X3 must occur in a 
matrix of the form on the left-hand side of (17); conversely the maximum must occur in a matrix of the form on the 
right-hand side of (17). 

Corollary 11. Fix a choice of probabilities {a, b, c, d, e, /} as above, and consider the matrices in R2X3 &s containing 
these fixed values. Then: 

(i) there is some M in such that the minimal value for the CMI of any matrix from the set R2X3 is given 
by I(M); and 

(ii) there is some A in ( 16) such that the maximal value for the CMI of any matrix from the set R2X3 is given 
by I(vj(A)). U 
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FIG. 2. Representation of the most basic horizontal-transposition-based majorisation relations on R.2x3 together with the 
action of the involution £ (the blue arrows) 



1. Aside: the basic majorisation structure in pictures 



Using the simple majorisation relations developed in the foregoing discussion we have established a kind of 'hon- 
eycomb' which is the backbone of the entropic partial order (£ across all of K\G. Figure [2] shows the basic hexagonal 
frames corresponding to the majorisation orderings in (17 1. The honeycomb consists of 10 hexagons each containing 6 



matrices (one row of 5 slightly below the other reflecting the standard form classification), with each matrix linked 
via a hexagonal pattern to the other matrices in its own group. Each hexagonal cell is in itself a diagram of the 



Bruhat order on S3. The 2 sets of 5 hexagons come from lemma 10 and the 12 lines of 5 matrices each (consisting of 
aligned vertices of the hexagons in their respective groupings) arise from variants of (15) and (16). The red numbers 
represent the 'major' element in each hexagon and are in fact all of the matrices in ( 15 ) for the top row, and ( 16 ) for 



the bottom row. Note that we have placed the maximal CMI element 48 at the very bottom point, reflecting the fact 
that it lies below every other matrix in the t>-partial order. The minimal CMI will occur for a matrix on the very 
top row (matrices 1, 7, 13, 25 or 31). 

The numbering is as per appendix [XJ ie the lexicographic ordering. We have stuck to this ordering as much as 
possible in the diagram itself, trying to increase numbers within the hexagons as we move down and from left to 
right; however in places we have changed it slightly so that the patterns are rendered more clearly. The black arrows 
represent the majorisation relations in lemma [10] which arise within each hexagon. 

The light blue double-headed arrows represent the action of the inner automorphism £ = ^ arising from the unique 
element u> — (1, 6)(2, 5)(3, 4) £ Sq of maximal length [5] which flips 22 pairs of matrix classes and fixes the remaining 
16. Since this automorphism respects the binary relations t> on R,2x3 it follows that any entropic binary relations 
(including of course majorisation) involving the nodes which have a blue arrow pointing to them will occur in pairs, 
thus considerably simplifying the structure. We shall explain this further in section [Til B| 



D. Transpositions and the classes in R.2x3 — K\G 

To avoid confusion, the image of an individual matrix M € A^2x3 m R-2x3 will be denoted by M (remember this is 
an equivalence class of 12 matrices and corresponds to a unique right coset of K in G), and we shall denote by M* its 



'canonical' representative in the original set R2X3 of 60 matrices: that is to say, a matrix of the form shown in (14) 
or appendix |A| 

We need to develop necessary and sufficient conditions for the relations M >~ N or M > N in cases where matrices M 
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and N are related by a single transposition. However we are dealing with matrix classes, so we need to be very clear 
about what we mean by saying that two matrices or matrix classes are 'related by a single transposition'. Let 
01,(72 6 G represent cosets Ka%, Ka 2 G K\G, and suppose that there is an element r which takes u\ to a 2 . The 
translation action of G is on the right, so this means that 

air = a 2 . (18) 

Suppose now that we are given fc^feef and we wish to find r' taking the representative kio~i to k 2 a2'- we find that 
it is 

t' = a^ 1 k^ 1 k 2 o- 1 T, 

and so since the middle k t product terms are all still just members of K we see that in each case we shall have a 
family of 12 distinct elements of G mapping us between the respective cosets. So r may well be a transposition, but 
its other cohorts will in general not be. However consider the case where r, r' are both transpositions. Then 

' ' — 1 r- TJ-a7 X 

T T — T T £ A 1 



and so in particular r'r must lie in the same G-conjugacy class as an element of K . But as products of two 
transpositions, K only contains the identity element and the elements (1,2)(4, 5), (1,3)(4, 6) and (2,3)(5,6), which 
means that either r' = r or else r' together with r effect a column swap, viz.: 



(J\T T 



-1 



= K 1 k 2 ai, (19) 



which in turn implies that only this one specific 'matching' transposition r' can move us between classes where we 
already know there is a pair of matrices related by r. However it is clearly NOT the case that given any element 
in the first class and any element in the second class, they will be related by a single transposition to one another. 
In summary therefore, when we say that two equivalence class representatives M* and N* are related by a single 
transposition t we are referring to examples where there exist matrices M,N £ A4 2X 3 with M £ M*, N £ N* such 
that M = N T . This means that our relations do NOT necessarily correspond to single transpositions between the 
class representatives of R,2x3- To restrict to these would be completely artificial, depending as it does on our choice 



of representatives. Hence in reading the lead-up to proposition 12 and theorem 15 it must be borne in mind that the 



matrix M can in principle be ANY matrix in A4 2X 3- We set this out formally now. 

Definition 6. Let M,N be any matrices in A4 2X 3 corresponding to elements o~MtO~n respectively of G. If there is 
a transposition t £ G such that o~mt = ctn then we shall say that the matrix classes M and N are related by the 
transposition t. 

We shall refer to a transposition as diagonal if it swaps two elements which are neither in the same row nor in the 
same column as one another; vertical if it swaps two elements of the same column: that is to say, the transposition 
only affects row sums; and horizontal if it swaps two elements of the same row: in other words it only affects column 
sums. 

Let T be the set of pairs of distinct R.2x3 classes in which each representative from the first class is related to 
at least one representative from the second class by a single transposition. There is a total of |T| = 360 different 
pairs (out of a possible 60 ~ 60 = 1770): a result which we derive in a moment. Let A be the 60 x 60 (symmetric) 
adjacency matrix of the relations embodied in T . By definition A will have 720 non-zero entries. Since transpositions 
generate G it follows that each of the 2x3 matrices is eventually in some form the product of transpositions acting 

on a fiducial matrix (which we fix throughout to be ^ ^ ^ / ) ' a PP enc ^ x 

of A will eventually have non-zero entries everywhere, reflecting the fact that every matrix is related to every other 
by a finite chain of transpositions. In fact it is easy to check directly that A 3 has no zero entries whereas A + A 2 
has 720 zeroes: hence 3 is the maximal length of a chain of transpositions linking any two matrix classes in R,2x3- 
(For completeness we note that the equivalent figure if we were looking at all 720 matrices would be 5 transpositions 
rather than 3). If in addition we restrict just to transpositions which fix a single point, say a as we did in setting up 
our R.2x3 classes, then we need at most 4 transpositions to navigate from any given matrix in R.2x3 form, to any 
other in that form. 

It is possible to derive the number \T\ as follows. On any of the 720 matrices in 7^2x3 we may act by any of 
(2) = 15 distinct transpositions, giving a total of 10,800 relations at the level of A^2x3- From the explanation above 
it follows that each relation of the form ([l8| gives rise to at least 12 other single-transposition relations between the 



A) and so it is clear that the powers 



same two cosets (just multiply both sides of ( 18 ) on the left successively by elements of K) and so we may divide this 
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by a factor of 12 immediately. However if two classes M, N £ R2X3 are related by a horizontal transposition then in 

u v w 



fact there will be (at least) 24 relations between them. To see why this is so, consider the matrix M 



x y z 



I V U Ul \ 

and without loss of generality assume that the horizontal transposition is r = (1, 2). So M T — I I . But the 

y x y z j 

y x z J ~ say, where cr = (4,5): this is the same explanation as 



that of the column swaps in ( 19 ) above. So because (at least) two transpositions are known t o m ap the element M 

of M to an element of M T , it follows from the discussion above and that regarding equation (19) that there will be 
exactly 2 x 12 = 24 single-transposition relations between the classes. This behaviour cannot occur for diagonal or 
vertical transpositions, as is easily seen: it occurs in the 2 x n case for horizontal transpositions only because modulo 
row-swap-equivalence, both the top half and the bottom half each 'tell the whole story' of the transposition. Now 6 
of the 15 possible transpositions are horizontal (namely the right-action transpositions which would yield the same 
results as left action by (1, 2), (1, 3), (2, 3), (4, 5), (4, 6) and (5, 6) in each particular case), with the remaining 9 vertical 
or diagonal. So on 6/15 of the relations we divide out by 24, and on the remaining 9/15 we divide by 12. So we have 
(10800 * 6/15)/24 + (10800 * 9/15)/12 = 720 ordered pairs. However we want unordered pairs so we divide this by 2, 
to obtain |T| = 360 as claimed. 

In the next two sections we shall see that 255 of these 360 pairs do indeed satisfy an cntropic binary relation E>. 



E. Majorisation within R.2x3: necessary and sufficient conditions 

We now study the majorisation relations in more detail. First we note necessary and sufficient conditions for 
majorisation (and hence >, by proposition [3]) between matrices related by a single non-diagonal transposition. Note 
that we do NOT necessarily work here with matrices in the form in R.2x3- 

/ OL X U \ 

Any vertical transposition r — (a,0), a > may be represented in the form M = I „ ^ * 1 being acted upon 
by switching the places of a and 0. Furthermore there exists a matrix in the CMI-equivalence class of M T such that 

(OL U V \ 
x y ) ' smce CMI is invariant in particular if we swap columns 2 and 3, it is evident 

(by possibly interchanging M and M T ) that we may stipulate that x > u, v, y. So having chosen a, our choice of x 
is fixed. The remaining 3 letters will then have 6 possible orderings, of which exactly 4 satisfy either u < y or v < y. 
Written in the above form it is evident that M y M T if and only if x + y > u + v (recall that we are only interested 
in row sums here since the column sums are fixed, and note that M T y M is not possible since x + y cannot be a 
priori less than u + v). But x is greater than both u and v, hence a priori majorisation will occur if and only if either 
u < y or v < y. Hence for each of the u) = 15 choices of {a, 0}, a > (3 there exist exactly 4 majorisation relations, 
giving a total of 60 arising from vertical transpositions. 

(Ct D X \ 1 3 OL X \ 

i— M = M T . 

y u v J yyuvj 

Notice first of all that our calculations of CMI differentials will be independent of the rightmost column (^) and so 
we may regard this as a majorisation comparison between vectors (a + y, + u) and (a + it, 6 + y), which reduces to 
a contest between u and y. We see that M y M T if and only if y > u (note that this is the same relation as if we 
interchanged a with y and 8 with u so to avoid counting twice we stipulate that x > v). Each of the ( 2 ) = 15 possible 
pairs {a, 0} with a > gives us ( 2 ) = 6 relations where both x > v and y > u, yielding a total of 90 majorisation 
relations in total, arising from horizontal transpositions. 

It is clear from the definitions that the respective sets of diagonal, vertical and horizontal majorisation relations are 
mutually exclusive. Moreover the property of being diagonal/vertical/horizontal is invariant under the equivalence 



relations used to construct the right cosets in R,2x3- Once we have theorem 15 below we shall have proven the 
following (the second part is easy to check using a program like SAGE). 

Proposition 12. There is a total of 165 distinct (strict) majorisation relations arising exclusively from transpositions 
between R-2x3 matrix classes. This comprises 15 from the diagonal transpositions, 60 from the vertical and 90 from the 
horizontal. By taking the transitive reduction of the directed graph on 60 nodes whose edges are the 165 majorisation 
relations just described, we find that 30 of them are redundant and so there are only 135 covering relations in this 
set. □ 

The two figures GR2 and GR3 shown on pages [IT] and [18] depict schematically all of the possible column sums 
(respectively row sums) formed from the 6 probabilities a, b, c, d, e, / in the rows and columns of the matrices in ^2x3- 
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| a+f | | b+e | | c+d | 




FIG. 3. Graph GR2 of covering relations between column sum coordinates 



It is apparent after a bit of thought that there is a 1-1 correspondence between matrices in A^2x3 and 'compatible' 
pairs {c(M), r(M)} where c(M) is a vector of 3 mutually exclusive entries from GR2 (the column sums), and r(M) 
is a vector of 2 mutually exclusive entries from GR3 (the row sums), and where we mean by 'compatible' that the 
chosen column sums can coexist in a matrix with the chosen row sums. Moreover two matrices M, N are in the same 
class in R2x3 if and only if there are permutations a <E S3, r 6 S2 such that c(M) = c(N) a and r(M) = r(N) T , 
where we view the actions of the groups as usual as simply permuting the coordinates of the vectors. 

The arrows in GR2 and GR3 all indicate a 'covering' relation between sums of probabilities: in other words if there 
is an arrow from a quantity X to a quantity Y then X > Y for every matrix in DJIq and there is no quantity Z (within 
the possible column or row sums respectively) such that X > Z > Y for every possible choice of matrix in DJIq. 

Proposition 13. Let M, N E A^2x3- Then M majorises N if and only if: 

(i) each one of the coordinates ofr(N) lies on a (directed) path in GR3 joining some pair of coordinates ofr(M); 
and 

(ii) each one of the coordinates of c(N) lies on a (directed) path in GR2 joining some pair of coordinates ofc(M). 

Proof. Let x, y G R™. By a well-known result on majorisation [5J 4.C.1] we know that x >~ y if and only if y lies 
in the convex hull of the n! points formed by all of the permutations of the coordinates of x. In our situation the 
column sum vectors are all elements of the 2-simplex E c = {(x,y,z) g ]R 3 : x + y + z = 1; x, y, z > 0}, whose 
vertices are the units on the axes (1, 0, 0), (0, 1, 0) and (0, 0, 1). Similarly the row sum vectors lie inside a 1-simplex 
S r = {(u,v) G R 2 : u + v=l; u, v > 0} with endpoints (1,0) and (0,1). 

Each matrix M G A^2x3 gives us a column sum vector c(M) which in turn gives (via the permutations of its 
coordinates under the action of S3) a suite of six points whose convex hull is a closed, irregular, possibly degenerate 
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| a+b+c | 
T 

I a+b+d I 
| a+b+e | | a+c+d | 
| a+b+f | | a+c+e | | b+c+d | 




| a+c+f | | a+d+e | | b+c+e | 





| a+d+f | | b+c+f | | b+d+e | 




| a+e+f | | b+d+f | | c+d+e | 



|~c+e+f | 



| d+e+f | 



FIG. 4. Graph GR3 of covering relations between row sum coordinates: note this is the Bruhat poset Sg 3 ' of figure 2.7 of [2] 



hexagon H(M) lying entirely inside the closed simplex E c (see figure [5]), whose individual coordinates are all nodes 
of GR2. Similarly M gives us a row sum vector r(M) whose convex hull is (under the action of S2) the line segment 
L(M) C E r and whose endpoints are r(M) and r(M)*, the image of r(M) under transposing the u,v coordinates: 
themselves nodes of GR3. Now let N be any other matrix in A^2x3- By definition [2j the hypothesis that M y N is 
the same as saying that r(M) >- r(N) and c(M) >- c(N) which from above is equivalent to saying that c(N) g H(A/) 
and that r(N) £ L(M). We remark that each vector in will give us a different hexagon and a different line segment 
for this same M, hence the point is to show that these statements are true for every choice of vector. 

So what we need to show is that r(N) £ L(M ) if and only if each one of the coordinates of r(N) lies on a (directed) 
path in GR3 joining some pair of coordinates of r(M) , and that c(N) £ H(M) if and only if each one of the coordinates 
of c(N) lies on a (directed) path in GR2 joining some pair of coordinates of c(M). But the arrows in GR2 and GR3 
represent order relations between real numbers which hold for all choices of vector in So the result follows from 
the definitions of L(M ) and H(M). □ 

Remark. It is clear from the foregoing that [H(iV) C H(M) and L(7V) C L(M)] if and only if M >- N. 
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FIG. 5. 2-simplex E c of possible ordered triples of values in GR2, with the hexagon H(M) of a matrix M with c(M) 
(0.6,0.3,0.1) 



1. The case of more than one transposition 

Corollary 14. Let M,N € A^2x3- Suppose that M y N but that each element of M is separated from every element 
of N by a product of at least n > 2 transpositions. Then with just two exceptions, there is an intermediate matrix 
class L separated from M by a single transposition and from N by (n — 1) transpositions, such that M y L y N . 
The exceptions are (34,47) and (46,47), namely: 



ace 
d f b 



y 



a d e 
f b c 



and 



a d e 
c f b 



y 



a d e 
f b c 



Both of these 'exceptional' covering relations factorise once the finer relation E> is introduced: that is to say they 
are no longer covering relations in (£. The factorisation paths are as follows: 

34 y 53 t> 47 and 46 > 34 y 53 > 47. 



Proof. (The matrices referred to in this proof are reproduced in appendix |B|) . 

Construct (by hand, or in a simple computer program) two matrices M2 and M3 representing the transitive 
reductions of the partial orders in GR2 and GR3. Since GR2 has 15 nodes and 20 directed edges and GR3 has 20 
nodes and 30 directed edges we obtain a 15 x 15- matrix with 20 non-zero entries for Ma, and a 20 x 20-matrix with 
30 non-zero entries for M 3 . In order to simplify things for a moment, let us speak only of column sums. Recall by 



proposition 13 that all possible (column sum) majorisation relations for matrices M, N will show up as each coordinate 
of c(N) = (m, r&a, n.3) lying on some directed path between two coordinates of c(M) = (mi, ma, 7713). But this is the 
same as saying that for each j = 1, 2, 3, there exist distinct k, I € {1, 2, 3} such that some power Ma p of the matrix 
M 2 contains a non-zero entry at (m^, nA and another power M 2 9 contains a non-zero entry at (n,-, mi). So if we form 
the sum (in reality a finite sum since M3 is nilpotent; but note that we need the identity matrix since the quantities 
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are > themselves): 

oo 

M 2 = M 2 P 

we need only check the respective entries (rrik, rij) and (rij, m{) of M2 for j = 1,2,3 to find whether such k, I exist; if 
so then c(M) y c(N). Similarly we form 

00 

M 3 = ^M 3 p 

and perform an identical procedure (with only two entries of course this time) to check for row sum majorisation. If 
we find non-zero entries for row and column sums in all 5 = 2 + 3 cases then we must have M >- N. 

If we now look at the adjacency matrix afforded by this procedure (where we put a 1 in position (i,j) iff matrix i is 
found to majorise matrix j under this test) then we produce a 60 x 60-matrix with 423 non-zero entries. Its transitive 
reduction T has 134 non-zero entries. 

If on the other hand we generate the 60 x 60 adjacency matrix of the directed graph produced by the methods of 



proposition 12 (that is to say, only using single transpositions) and take its powers we find a matrix with 421 non-zero 
entries, with a transitive reduction T' computed by SAGE to contain 135 entries. 

Now if we subtract the second of these two matrices from the first we find that U = T — T' has just 5 non-zero 
entries as follows (recall that we use the lexicographic ordering on the R.2 X 3 matrix classes to index these adjacency 
matrices): 

U3447 = +1; U4447 = — 1; U45 : 47 = — I] U 46: 47 = 1; U46,48 = — 1; 

which is precisely what is expected if we introduce the two exceptional relations mentioned in the statement of the 



corollary, into the relations in T'. (Sec appendix B 3 1 . □ 



So all but two complicated majorisation relations will decompose into smaller majorisation relations arising from 
single transpositions. Indeed modulo > proposition [12] tells the whole story of majorisation as promised in the outline 
of the proof of theorem [T] One might hope that such a benign situation would also be the case for the relation > in 
this 2x3 case: and indeed, there are again very few exceptions (we can prove that there are at least two, and possibly 
up to five) . In order to establish the structure of the poset £, we need to establish necessary and sufficient conditions 
for the occurrence of a relation t> between matrix equivalence classes which are related by a single transposition, and 
then as we have just done with majorisation, establish which are the exceptions. 



F. The entropic relation > in R.2x3: necessary and sufficient conditions 

We are able to obtain quite a dense partial ordering of the 60 matrix classes in R-2x3 on the basis of the entropic 
partial order relation >. Indee d al most one half of the possible pairs of distinct matrix classes are (conjecturally in 
the case of 4 pairs - see theorem 17| related to one another: we obtain 830 relations out of a possible ( 6 2 °) = 1770. The 

'>0yi 



transitive reduction of these 830 yields 186 covering relations, as we shall show below. In the last section we found 
necessary and sufficient conditions for the majority of these relations which arise through 'horizontal' and 'vertical' 
transpositions and the consequent majorisation which occurs. As per proposition [3j the notions of majorisation and 
the entropic partial order relation E> are the same thing in these cases: so only the 'diagonal' transpositions remain 
to be studied. 

Here we develop necessary and sufficient conditions for the relation > to obtain in the case of a single diagonal 

(ot x v \ 
u (3 v ) ' somc ^ ^ ^ 6 ' a diagonal transpo- 

( u X XI \ 

sition r = (a,/3), a > f3 takes this to M Ta — I ^ * I representing a class of CMI- invariant matrices of which 
f a u v \ 

one is R I . Now by possibly interchanging the classes of M a and M Ta it is clear that we may require that 
\ x p y J 

x > u. Since we are examining only binary relations between pairs of matrices we are able to require that the pairs be 
ordered like this for the purposes of checking whether a > err or ot E> a. (Note once again that we do NOT assume 
that x > y here, nor that a = a as we are not in general working with matrices in the form in R-2x3)- 
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I Ct X V \ 

Theorem 15. Let M a ~ \ u p v ) be a matrix as above with a > j3 and x > u. For r = (a, p): 
Type A: err > a v > y. 

Moreover we have a stronger relation as a sub-class of this ('type A majorisation') 

gt !>- (j v > x > u > y . 

Conversely, 

Type B: a > ar y > x > u > v. 

Proof. It is convenient to divide the single-transposition entropic cases into two types as we have done in the statement 
of the theorem, which we shall henceforth refer to as type A and type B. Type A is where in the above notation we are 
able to say that /z#(r^, r a )/ii#(c^, c a ) < iiH{rp,r T p)pLH{cp,c'p) for all matrices M £ , which is the same as saying 
that I{M") > I(M Ta ) for all M £ SDtg: that is, that ar t> a. Type B is exactly the opposite set of inequalities. 



/\ 




T 

- - - ' J* 



r /3 C a 



FIG. 6. All fixed relations between the quantities r a , rp, c a , cp, r T a , rj } , c T a , c$ in the 2 x n case assuming always that a > f3. 
The additional dashed lines complete the picture for the case n — 3 where we assume in addition that x > u. 

Recall the quantities r a , rp etc. from proposition [5] and consider the matrix Q for the special case where m = 2 
(and n is any integer). We let a, (3 represent any two of the pij which are in different rows and in different columns 
from one another. The assumption that a > j3 implies the relations in figure |6j where a solid downward arrow from k 
to I indicates that k > I. 

In our case rp — (3 + u + v, cp = j3 + x, r r a — j3 + x + y, c T a = (3 + u. Since the hypotheses of the theorem include 
the requirement that u < x, we must have c T a < cp and it then follows from the solid lines in the diagram that c T a 



must be the minimum of the four quantities r^,c^,rp,cp in (10). We have drawn in the dashed lines to reflect this 



additional information for the case n = 3 (only). In addition it is apparent from these formulae that 

(v > y) (rp + cp>r T a + c T a ) , 

so by assuming v > y we shall have fulfilled the hypotheses of the second part of proposition [5] Hence the condition 
given for type A is sufficient. That v > y is also a necessary condition will follow from the results on type B which 
we are about to prove. We remark that since type A and type B are mutually exclusive, it also follows that v < y is 
a necessary condition for type B. 

Now there are 12 possible orderings for the four values u,v,x,y, remembering that u < x must always hold. In 
reverse lexicographic order these are: 

(I) y > x > v > u 
(II) y > x > u > v 

(III) y > v > x > u 

(IV) x > y > v > u 
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(V) x > y > u > v 
(VI) x > v > y > u 
(VII) x > v > u > y 
(VIII) x > u > y > v 
(IX) x > u > v > y 
(X) v > y > x > u 
(XI) v > x > y > u 
(XII) v > x > u > y 

We should point out here that any of the above inequalities may be relaxed to > : of course if a or (3 (or any other 
variable) should happen to be in-between two values of u, v, x, y which are equal then they shall also be forced to be 
equal to their neighbours - but this does not affect any of the arguments below. However because of this we shall need 
to prove strict violations of inequalities (that is, if we are trying to prove a contradiction to some expression / > g 
then we shall need to provide an example where actually f < g). 

One sees straight away that cases VI, VII, IX, X, XI and XII are all of type A, since v > y. We now proceed to 
show that case II is the only type B and that the remaining cases (I, III, IV, V and VIII) are neither type A nor 
type B. We first claim that 



rpcp < r C) 



(20) 



is a necessary and sufficient condition for type B. Consider once again the fundamental expression ([9]). Recalling 
that c T a is the smallest of the four terms, (201 implies that we must have rp < r T a and eg < r T a . Hence setting p = c T al 
q = min{c^, rp}, r = max{c^, rp\ and s = r T a gets us into the situation of the reverse implication of part (vii) of 
lemma [6] namely we know qr < ps: so it follows that 



Mg(r/3,rg)/x g (c3,cp 
^H{r T a ,r a )p, H {c T a ,c a ) 



< 1, 



which by definition means type B. So (201 is a sufficient condition for type B. We now show it is also necessary. Using 



the explicit formulae above for the row and column sums we see that ( 20 ) is the same as the condition 



v/3 + vx < y/3 + yu, 



(21) 



and so we may write the reverse inequality as: 



x + (3 y 
u + f3 v 



(22) 



Since y > v is a necessary condition for type B as observed above, we shall have proven the necessity of (20) for 
type B if we can prove the following: 



Claim. 7/(22) holds then y < v. 



For suppose to the contrary that we have some matrix M € SDtg satisfying both (22 ) and y > v, so in particular we 
must be in one of the situations I, II, III, IV, V or VIII above. In probability distributions of type I, II, III and VIII 
we may set u — x and so y = v, a contradiction. In IV and V we may set y = x and u — v and since we can always 
construct an example where j3 > 0, we have ux + j3u > ux + fix which is a contradiction since x > u. This proves the 
claim. 



Since ( 21 ) is equivalent to ( 20 1, to complete the proof of the theorem for type B it only remains to show that ( 21 ) is 



equivalent to the condition II, namely y > x > u > v. Now II certainly implies (21 ), so we just need to prove that (21 ) 



implies II. Our hypotheses include the assumption that x > u so it is enough to show that y > x and u > v. Recall 
that we are still in one of the cases I, II, III, IV, V or VIII, because v > y would produce an immediate contradiction 
to (21 ) since x > u. Suppose that v > u (ie forcing us into cases I, III and IV): then setting v — y we see that (21 ) 
reduces to vx < vu, a contradiction to x > u. So u > v as required. Similarly suppose that x > y (ie cases V 
and VIII): then again setting v = y, the inequality (21) contradicts x > u. So y > x, completing the picture that 



condition II is a necessary and sufficient condition for type B. 
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We now prove that the condition v > y is necessary for type A. Suppose to the contrary that we have type A but 
that y > v. By figure [6]we know that cp < r T a and by the formulae above y > v implies rp < r T a , so we are again in 
the situation of lemmap] (vii), with p — c T a , q — minjc^, r^}, r = max{cp, rp} and s = r T a . With these definitions, 
type A is synonymous with the condition 

Hh(v)^h{s) 



and so the lemma implies that rpcp > r T a c r a which we know from above is equivalent to (22 1. But the claim above 
showed that this cannot hold under the assumption that y > v, yielding the desired contradiction. 

This completes the proof of the central assertions of the theorem. It remains to show that if majorisation occurs 
for a diagonal transposition then it must be in the situation of condition XII, and conversely that in the sub-class 
of type A where v > x > u > y in fact we have majorisation. The latter follows immediately on substituting these 
relations into M a and M Tcr . Conversely, consider the column sums: since a > j3 and x > u by hypothesis it follows 
that x + a>u + a>u + /3 and x + a>x + /3>u + (3, hence the columns of M Ta must always majorise those 
of M a . In particular this rules out 'type B majorisation'. So the only type of majorisation which is possible in this 
diagonal transposition setup is type A. Suppose then that M Ta y M a . By considering the row sums this time we see 
that a + u + v>a + x + y 1 ie u + v > x + y. Since i>swe must have that v > x and u > y (to see this, consider 



once again the diagram GR2 on page 17). So we may conclude that a necessary condition for type A majorisation is 



that v > x > u > y. So we have proven the claim about majorisation. □ 

Corollary 16. Given a probability distribution a>b>c>d>e>f as above, for any ordered pair a > j3 chosen 
from {a, b, c, d, e, /} there exist precisely 7 diagonal entropic relations, of which exactly one is moreover a majorisation 
relation. Since there are („) = 15 such ordered pairs, there exist exactly 105 diagonal entropic relations between the 
matrices in R,2x3 arising solely from transpositions. Furthermore 90 of these CANNOT be derived by majorisation 
considerations. 

Proof. Given any one of the 15 possible pairs (a, /3) with a > f3: exactly one of the % = 12 configurations of the 
remaining letters u, v, x, y (remembering always that u < x) satisfies v < u < x < y, and 6 satisfy v > y, of which one 
further satisfies v > x > u > y. This means of course that 5 of the remaining configurations satisfy neither type A 
nor type B. □ 

We now deal with the situation when there is more than one transposition. 

G. The case of more than one transposition: the 'sporadic 5' 

1. Definition of the 'sporadic 5' and proof of two of the relations 

Recall that a relation x > y in a partial order is called a covering relation if no z ^ x, y may be found such that 
x > z > y. 

Theorem 17. There are at least 2 and at most 5 covering relations between equivalence classes in R.2x3 which arise 
exclusively from products of two or more transpositions. They are: 

15>10: ^dc/^^^e/c^) (P roven below) 

26>10: f^jf^^fe/c) (proven below) 

37t>ll: f^^e^^f/cel (conjectured below) 

43t>ll: ^^c/^^^/ce) (conjectured below) 

49t>ll: f^ce^^fjce) (conjectured below). 
We shall require an n-dimensional analogue of lemma 5 of [S] . 
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Lemma 18. Let v = (?;,), w — (w{) be two vectors in W 
be any strictly log-concave function defined on K + . Then 



with non-negative entries and suppose that v >- w. Let 



Proof. By chapter 3, E.l of [8j the product of <j> on the components is strictly Schur-concave. 



□ 



Proof (of the theorem). A general rule similar to that in theorem 15 for cases where matrix classes are related by 
two transpositions seems to be very difficult to formulate. So to avoid having to do this we first of all invoke the 
following empirical result. We constructed a program on Matlab which easily shows by counterexample that any pairs 
not related by a sequence of covering relations arising from proposition 12 theorem 15 and/or the above list of five, 
will not have any > relations between them. It never seems to require more than 10 b randomly chosen probability 
vectors (just using the rand(l,6) function on Matlab with no modifications other than normalisation) in order to find 
a counterexample in any given instance - usually of course one needs far fewer than this. So it remains to prove that 
the two relations above indeed do hold, and we shall be done. 

We remark that the first and second relations, and the third and fifth relations, are each pairs of relations which 
are images of one another under the automorphism £ w (see appendix B4|. So our proof that 26 t> 10 actually points 



us to a kind of 'mirror image' proof of the relation 15 > 10; and we would expect similarly for 37 > 11 and 49 > 11. 
We first show that 26 > 10. We have to prove that 

H(a+e) + H(b + f)+H{c+d) + H(a + b+d) + H(c+e + f) > H{a + b) + H(c+f)+H(d+e)+H{a + c+d)+H(b+e + f), 

which using the same technique as in ^ we may rewrite as 



(b - c) log 



(b-c) 



(b-c) 



(a + c)^°~ c) (c + e + f) ^~ d \c + d) ^ n 

+ (a-d) log ■ _ > 0, 



(a + c+d) fip c) (c + f) 



^~ d \d + e) 



(23) 



where we have written [i l H (x) for fijj(x,x + t). Wc have added in a "dummy" factor H(a + c) and then taken it out 
again, which has enabled us effectively to 'factorise' the path from 26 to 10 via the matrix class 8. The monotonicity 
in x of (J,jf(x) for fixed t (lemma Kni)) shows that the second term is always > (indeed this is simply the expression 
which shows directly that 8 > 10); so since a > b > c > d the left-hand side of (23) is greater than or equal to 



(b - c) lof 



fr c \a + c) ^- C) (c + e + f) f ^- d \c + d) 
t C \a + c + d) ^~ C) (c + /) /4T d) (d + e) ' 



(24) 



and once again by lemma [^i) we know that /i^ c \a + c) > (J£ c \a + d) so (24) is in turn greater than or equal to 
the following expression: 



M £- c) (a + d) ^°~ c > (c + e + f) ^ a > (c + d) 



,(b-c) 



{a-d). 



(b — c) log 



(6- 



(a + c + d) 4" C )( C +/) M £-"'(d+ e ) 



(a-d). 



(25) 



which has the added symmetry that the sum of the arguments of the various /x^'s in the numerator equals the sum 
of the arguments in the denominator. So we may compare these vectors of arguments and we find that 



(a + c + d, c + f, d + e) >- (a + d, c + d, c + e + /), 



(26) 



since in IR 3 a necessary and sufficient condition that a vector v majorise w is that v contain the overall maximum of 
all 6 components of v, w (in this case a + c + d) as well as the overall minimum (in this case either c + f or d + e). 
Since b > c we shall be done if we can show that the argument of the logarithm in (25 ) is > 1. 
We now claim that 



First we note that 



^- C) (a + d)^- C) (c + e + f)^ (c + d) 
(a + c + d)^ C) (c + f )^- d) (d + e) 



^(log( MH (a,* + i))) = -^-L_ 



> 1. 



(27) 
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which proves that hh(x,x + t) is strictly log-concave in x for fixed t. So if the terms ^(X) in (27 1 all had their 
i-terms equal then ( |26[) w ould give us our result, by lemma 18 The strategy therefore is to replace the rightmost top 
and bottom terms inpJJ) respectively by terms of the form 7% ^ (X + (c — e)) and [i^ ^ (X) whose ratio is less than 
or equal to F a _ d) -: provided that the corresponding majorisation relation still holds then we shall have finished. 

Mff '(d+e' 



Note that by lemma |6j\ 



> 



while part (vi) tells us that for any e £ (0, 1 — b — d): 



> 



^- c \d + e + e) 



that is to say, increasing the arguments of the numerator and denominator by the same amount will decrease the 
value of the expression. So we know that such an X, if it exists, must be greater than d + e. However we cannot 
increase the arguments so as to disrupt the majorisation relation (26), which means that the maximum value of the 
new argument in the numerator cannot be greater than a + c + d, which in turn translates into the value of X being 
less than a + d + e. (Note that the minimum in (26) will not be violated because c + / is still a component of the 
vector of arguments of the denominator). So again using lemma |6][vi) we see by continuity that such an X must exist 
provided we can prove that 

^- c) (a + d + e+{c-e)) (c + d) 



ti { H~ C) (a + d + e) 



^- d \d- 



Now the internality of the identric mean [5] guarantees that fj,^ a> (d + e) < /i^ c> (a + d + e) and that fj,^ a, (c + d) < 
fifl c \a + c + d) which together with lemma [^i) gives us the following ordering: 

^- d \d + e) < {^- d} (c + d), ^- c \a + d + e)}<^- c \a + c + d). 

So if we can show that the sum of the central two terms exceeds that of the outer two terms then by lemma 4 of [9] 
we shall be done (alternatively, apply lemma 18 to the function <fi(x) — x). This is equivalent to showing that 



M-c). 



(a-d). 



{ H- d \c + d)-(i ( «- d \d + e) > ^- c) (a + c + d)-^- c) (a + d + e). 



(28) 



But the difference between the pairs of arguments on both sides is the same value (c — e) , so this becomes a question 
about the relative steepness of ^ and ^ d \ We know that ^ H (x) itself is strictly concave in x by lemma [^ii), 



so we may define new Lagrangian means m = n (a - d \ (d+ e) and 9Jt = a^ (l ^L(a + d + e) which by the internality of the 

Lagrangian mean [3j VI. 2. 2 satisfy m < CDt. Denote by n l H (£) the slightly more awkward expression (/Zjy(x)) |x=5- 
Dividing (28) through by a factor of (c — e) we obtain 

which is what we now must prove. But using lemma [6] once again: 

^'(m)>4- c) '(m)>4- c) '(0H), 

where the first inequality is from part (iv) and the second from part (ii). This completes the proof that 26 E> 10. 

To prove that 15 t> 10 we need only mimic the above proof replacing each probability a, b, c, d, e, / by its respective 
image /, e, d, c, b, a under the obvious linear extension of ^ and then reversing all the signs. With a little care, the 
proof goes through exactly as above; we shall just mention the key points. One word of warning: using our abbreviated 
notation ^(x) for /xg x + t) can be a little confusing because the image under £ w will be 11^™ (£u (x + t)). 



(c-e) 



The equivalent of (24) will be: 



(d - e) log 



^~ e) (a + b + e) M g" e >(e + f) ^>{d + f) 



(29) 
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and our corresponding move to obtain something in the form of (|25j), with comparable vectors of arguments on the 
top and the bottom, is to add c — d to the argument of /i^ e ' (e + /), giving us finally the following expression which 
we must show is always > 1: 



{ t e) (a + e) $~ e) (c + e + f)^- f \b + f) 



$- e) (a + b + e) M g" e) ( e + / + ( C - d)) (d + f) 



(30) 



The remainder of the proof now proceeds in an identical fashion to that for 26 \> 10: we show the existence of an 
X € (d + f, a + d + e) such that 



fi { H~ e) (X + b - d) 



/4r /} (fr+/) 



by showing using Lagrangian means, that 

fi { ^ e) (a + d + e+(b-d)) 



< /4r /} (*>+/) , ^r e) (fe+/) 



/4 d_e) (rf+/)' 



^ e) (a + d + e) 

thereby squeezing the desired valu e b etween two points on the curve of the monotonically decreasing function 

18 to relate that to the original question. □ 



>(x+(b-d)) 



We then use lemma 



2. The three conjectural sporadics 

Unfortunately I have been unable to prove 37 > 11, 43 > 11 and 49 t> 11: the structure of these three is markedly 
different from the ones we have just proven, and does not seem to yield to any similar techniques. So we may merely 
state the following conjecture: 

Conjecture 19. In the above notation, 

»>"= {l C /e)»("fc d e) 

( a d f\ ( a b d\ 
49>ll! [b c i)»[f c e)- 

As mentioned in the introduction we shall collectively refer to the above three relations together with the corollary 
relation 31 t> 11 as C4. Also recall the definition of the binary entropy function h(x) — H(x) + H(l — x). One 
fascinating result of our numerical work - which to some extent highlights the unusual nature of these four relations 
- is that if we simply substitute h for H then we obtain a partial order which shares all 826 relations which we 
have proven for H, together with seven other relations, but the C4 are broken as may easily be shown by example. 
Moreover they are broken around fifty percent of the time. So somehow the extra symmetry of the binary entropy 
function, as opposed to the simple entropy function, wipes out precisely these four relations. One might hope that 
such a schism in behaviour would point the way to a proof of the conjecture above, although I have been unable to 
find one: in particular because h does not lend itself to analysis by the methods of this paper (for example, lemma [6] 
(ii) fails for h). There are many other functions which also break the C4 exclusively out of the 830, including all 
quadratics: one way to prove the above conjecture would be to show that entropy lies on a continuous manifold of 
functions well within the family of functions which respect all 830 relations. However such a proof also seems very 
difficult because of the convoluted nature of the Lagrangian mean functions which are involved (indeed they are often 
only piecewise defined). 



3. Summary of the partial order structure 



So we have a total of 262 = 165 + 2 + 90 + 5 relations which in some sense are 'primitive': there are no duplicates 
and the list exhausts all possibilities, bearing in mind that three of these are conjectural. In fact as we mentioned in 
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the proof just now, it is easy to check that any pair not included in the relations obtained by viewing these 262 as 
a (nilpotent) adjacency matrix ^4r 2x3 and then looking at all the powers of ^4r 2x3 , is not able to be a relation by 
constructing a few simple random samples say on Matlab. Taking the transitive reduction of this larger graph the 
overall number of covering edges reduces to 186, made up of 115 majorisation relations and 71 pure entropic relations. 
That is to say, the process of taking the transitive reduction of ^4.r 2x3 factorises 50 relations from the majorisation 
side and 24 from the entropic side. As mentioned at the beginning of this section these primitive relations give rise 
to a total of between 826 and 830 relations overall. 

This completes the proof of the analytic side of theorem [l] once we note that the density of the partial order £ is 
given by a number between fffo and that is approximately 0.47 as claimed. It remains to outline the algebraic 
structure of (E in the next chapter. We conclude this chapter with a curious fact about the entropic relations. 



4-. An aside: strange factorisations in the no-man's land between majorisation and > 

We remark on a phenomenon which arises in the interplay between majorisation and the relation > which perhaps 
is a clue to delineating the kind of 'majorisation versus disorder' behaviour which Partovi explores in [10] . 

Adding the 'sporadic' entropic relations from theorem [17] to the 90 'pure entropic' relations from corollary |16| we 
obtain a maximal total of 95 relations which are NOT achievable through majorisation. It turns out that the transitive 
reduction of the (somewhat artificial) graph on 60 nodes whose edges are these 95 relations in fact is identical to the 
original graph. That is to say, all 95 are covering relations when we consider only the pure entropic relations (ie 
no majorisation). Curiously however when the majorisation relations are added in, there are many cases where an 
entropic edge ceases to be a covering relation and factors through a majorisation plus an entropic, so we have the 
following strange situation for right coset representatives L,M,N € R2X3: 

L^M\>N^L>N but L^-N !! 



An example of this occurs if we set L = 31 = 



M = 43 = 



and N = 10 = 



Then 



as is easy to check using the conditions in theorem 15 and the discussion preceding proposition 12 L >- M\> N (which 
implies L > N by the transitivity of > and proposition 3| but L )/- N . 

Similarly a kind of 'inverse' situation also occurs - albeit less frequently - namely 



R\> S ^ T ^ R[>T but R^T. 



For completeness we mention an example of this too: take R = 5 = 



a b c 
f d e 



S = 11 = 



a b d 
fee 



and 



T = 35 = 



ace 
f b d 

To get some insight into this we need to show to what extent the two relations >- and t> are the same. Recall from 
proposition [3] that majorisation implies >, but not conversely: for when M >- JV we know from the fact that entropy 
is a Schur-concave function that H(r(M)) < H(r(N)) and H(c(M)) < H(c(N)). Hence the terms from TV entirely 
dominate those from M, giving us the entropic relation M > N. We now explore the extent to which the converse 
might be true. 

The relation M t> N when M )/- N may be thought of as a tug-of-war between the entropy differential of the row 
vectors r(M) and r(7V), and that of the column vectors c(M) and c(N). In principle it would seem that either column 
entropy or row entropy could win the tug-of-war - and indeed each of these situations occurs in examples. However it 
turns out for any fixed pair M, N where M t> N that a priori either the column vectors always dominate, or the row 
vectors always dominate. 

Proposition 20. Let M, N e A4 2 x3- If M > N then a priori either r(M) >- r(N) or c(M) >- c(N). 

We remark that the differential (row or column) which does not have a majorisation relation acts as a kind of 
'swing' factor: it may be positive or negative in many instances - obviously if it is always positive then we have M >- N 
- but there are also many examples where the other factor ALWAYS has the opposite sign, but never gets large 
enough to outweigh the effect of the majorisation: indeed this 'other' factor majorises the other way, giving us indeed 
a tug-of-war. For an example of this look at any instance of type B: we automatically have row sum majorisation in 
the same direction as the entropic relation (see below), and it is immediate that column sum majorisation goes in the 
opposite direction. We should also note that there are 30 pairs of matrix classes M, N where neither r(M) >- r(N) 
nor c(M) >- c(N) (nor indeed either of the converses) - that is to say, they have no a priori relations even on the 
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level of row or column sum vectors. By proposition 20 there can be no entropic relations between such matrices. 
Furthermore, none of these 30 examples may be realised by a single transposition: indeed they all involve changes in 
all three columns and in both rows. In other words they must in general have no common coordinates between the 
row sum vectors, nor any between the column sum vectors. We list them for reference in appendix |B 3| 



Proof. First we note that if M y N then the result is known by definition. Furthermore, since majorisation is 
transitive it follows that if we know the result to be true for an entropic relation M \> N and if P y M or N y Q 
then we know the result would be true for P > N or M \> Q. So we need only focus on entropic relations which 
cannot be factored into any product involving a majorisation step. We consider first of all the 90 relations arising 



from theorem 15 that is to say, those which arise from a single 'diagonal' transposition M M T where r = (a, 3) 



{ ot x u \ I 3 x v \ 

and as above we represent the matrices by M = o \ and M T = I ). Now 'most' of these relations 

V u p v J a v ) 

are of the form M T > M - what we referred to as type A in the proof of theorem [15] above - and we see immediately 

that the vector of column sums of M T must majorise that of M by virtue of our constant assumptions that a > 3 

and x > u. So we are done for all type A entropic relations which arise from a single transposition. For type B we 

note again from theorem |15| that a necessary and sufficient condition is y > x > u > v, which implies in particular 

that x + y > u + v, which together with a > 8 means that the vector of row sums of M must majorise that of M T . 



So it remains to show that the proposition holds for the sporadic 5 relations of theorem 17 and that it holds when 
we compose successive entropic relations. The former is easy to show directly (in each of the five sporadics M\>N it is 
the case that c(M) y c(N)). That the proposition holds under composition of relations is obvious (by the transitivity 
of majorisation) when we consider a sequence of two or more type A relations and/or sporadic relations; or indeed 
if we were to consider a sequence consisting only of type B relations. So the only issue is what happens when we 
compose a type B with a sporadic or with a type A. 

The sporadic relations are easy to deal with: recall from appendix |B 3| the 15 type B relations. Comparing this list 
with the list of the sporadic instances in theorem [17] we see that only the following sequences can occur between the 
two sets: 15 > 10 > 60, 15 D> 10 t> 24, 26 t> 10 > 60, and 26 > 10 > 24. In particular there are no relations of the 
form type B followed by a sporadic. Considering each in turn we are able to show directly (using say the graph GR2) 
that the column sum vectors of the left-hand sides always majorise those of the right-hand sides, hence proving the 
claim. Indeed the middle two relations 15 t> 24 and 26 t> 60 actually exhibit full majorisation. 

The claim for the composition of type B with type A follows from a similar case-by-case analysis of the instances 
where they 'match up' (ie where we have a type A relation X > Y followed by a type B relation Yt> Z, and vice- versa) , 
using the matrix of 830 relations referred to above. We omit the details. □ 
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III. A purely algebraic construction of the entropic partial order (£ 



So we have our partial order l£ which has been defined entirely in terms of the entropy function. In this next 
section we shall briefly describe a combinatorial or algebraic construction which presupposes nothing about entropy 
but whose derivation mimics the case- by-case constructions of proposition [12] and theorem |15| Unfortunately I have 
not been able to find a more natural expression for these relations than this: it is tantalisingly close to a closed form 
but it seems always to be burdened with some 'exceptional' relations which must be subtracted, no matter how they 
are phrased. 

When we use another strictly convex or strictly concave function / instead of entropy and define a kind of /-CMI 
by substituting / for H in the definitions, then it is these exceptions which come into play: the coefficients of the 
summands in (31) will change depending upon the curvature properties of /, yielding new partial orders. Indeed by 
studying the simple family of functions { ±x p : p € K } we are able to construct functions which 'tune into' or 
'tune out of various components of the partial order (£, yielding a phenomenon akin to that of the family of Renyi 
entropies on vectors w hich approximates Shannon entropy near 1. For example, it is easy to show that the equivalent 

for f(x) — ~x 2 are that type A occurs if and only if v > y, and type B occurs iff y > v; 
;he same majorisation relations as for / = H these generate all of the 1184 relations which 
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conditions to theorem 
moreover together with t 

hold for this /. Perhaps the most curious fact is that just like the binary entropy function h defined in the last section, 
the only relations which are actually broken from 2; in going from H to / are those we have called C4. As mentioned 
in the introduction, this is a vast topic for further study. 

We say a quick word on the process of finding this algebraic description, which to some extent ties in with the 
statement of theorem [I] The 'shape' of the group ring elements below was discovered by considering the image of 
the right coset space K\G under some of the outer automorphisms of G: namely those which send if to a parabolic 
subgroup. There are six parabolic subgroups which are isomorphic to K: ((1, 2), (2, 3), (4, 5)), ((1, 2), (2, 3), (5, 6)), 
((1, 2), (3, 4), (4, 5)), ((1, 2), (4, 5), (5, 6)), ((2, 3), (3, 4), (5, 6)), ((2, 3), (4, 5), (5, 6)), and for each one there exist several 
outer automorphisms which map K onto it. Choose any such J and a corresponding outer automorphism Q. The 
right coset space J\G is isomorphic as a G-set to K\G. The image under £ of each matrix class forms a kind of 
pyramid, with the row- and column-swap equivalences being transformed into equivalences between the positions of 
a singleton, a pair and a triple of probabilities. Relations between these pyramids turn out to be much easier to 
visualise than those between matrices, and the (almost-) cyclic structure of our group ring element r? TjC yc below was 
much more apparent in that form. 



A. The abstract combinatorial construction 



Let G — S 6 be the symmetric group on the set of six elements {1, 2, 3, 4, 5, 6}. If a € S 6 acts by sending i to a(i) then 
one way of representing a is to write it as the ordered 6-tuple [c(l), <r(2), cr(3), cr(4), er(5), cr(6)]. On the other hand we 
shall also represent elements of G in standard cycle notation: as in the rest of the paper, elements are understood to act 
on the left . That is to say for example that the product (1, 2)(2, 3) is equal to (1, 2, 3) rather than to (1, 3, 2). Define K 
to be the subgroup of Sq generated by the elements (1, 4)(2, 6)(3, 5) and (1, 6, 2, 4, 3, 5). Then K is isomorphic to the 
dihedral group of order 12. The reason for choosing this particular subgroup is that when the vectors are arranged in 
the 2 x 3-matrix form, left multiplication by this subgroup gives exactly the row- and column-swap operations under 
which CMI is invariant: this is clearer if we choose the more obvious generators (1, 2)(4, 5), (1, 3)(4, 6), (1, 4)(2, 5)(3, 6). 

The right coset space K\G contains 60 elements and may be made into a right module for the action of the 
group ring Z[G] by taking the free abelian group whose generators are the right coscts Ka of K\G. Let 1 denote the 
multiplicative identity element of Z[G], which is identified in the usual way with 1 • Iq where 1q is the identity element 
of G and 1 represents the integer 1. Let r = (a, j3) be any transposition in G, and let {r, s, t, u} = {1, 2, 3, 4, 5, 6}\{a, f3} 
represent the four elements left after removing a and (3. Assume that we have ordered them so that r > s > t > u. 
Let 'i/v = (r, s), Xt — {s,t) and 7 T = (a,u,t)(fi,s,r). Let [i T be (a, /3)(r, i)(s, u), the unique involution which 
fixes 7 T and which interchanges (r, s) with (t, u). Finally, define o~ T to be any one of the 12 elements of G which 
take [1,2,3,4,5,6] into the right if-coset of the permutation [a,r, s, /3,u,t] by right multiplication. 

Using the same notation for group ring elements as for their counterparts in G, with coefficients assumed to be 1 
unless otherwise stated, let 

?7r,horiz = (1 + i'r)(l + ^)(1 + Xr) ~ (1 + *r^ T )Xr, 

which upon expansion has six terms, and let 



Vr.cyc = CT T (1 + 7r + TrX 1 + Vv)(l + Xr) - ^rll^rXr, 
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which has eleven terms. Finally define the group ring element 

Vt = (?7r,horiz + Vr,cyc) - 1), (31) 

which therefore has a total of 17 terms of the form %{t — 1) for some 3 representing an element z € G. 

Definition 7. Let t run over the 15 transpositions in G. Define a binary relation ► on K\S§ by letting each summand 
of each r/ T of the form j(t — 1) represent a relation of the form 

Ki ► Kyr. 

This yields 15*17 = 255 binary relations. 

Theorem 21. The transitive closure of the relations ► just defined together with the five sporadic relations of 
theorem] 771 is identical to £. 



Proof. Take the relations from definition [7] and theorem 17 and generate their transitive reduction: it is identical to 



that of € as per appendix |B 4| □ 

We may define such an element rj T for any of the 15 transpositions in G; or we could equally well take a starting 
transposition r arbitrarily and then 'navigate' between all of its conjugates by using only adjacent transpositions k 
which share a common element with t. That is to say, k = (a, a ± 1) or k = (j3 ± 1, 0) with the possibilities obviously 
constrained by where a, (3 lie in the set {1, 2, 3, 4, 5, 6}. Denoting by g K as usual conjugation of g £ G by k we then 
define a K — ct t k, ip K = ip£, \ K — Xr> 7« = 7r> M« = A^r an d we S e * the same outcome for r\ T as we would have done 
with the direct definitions above. So it is possible to generate inductively all of the 255 relations from one starting 
point. The adjacency and common element conditions for n are necessary because they preserve the rigidity of the 
orderings a K > (3 K and r K > s K > t K > u K . 

All of this raises an intriguing question. Does € correspond to any of the well-known orders on quotients of the 
symmetric group? The naive answer is no: our partial order is 'complicated' in the sense that it is not properly 
graded: many covering relations have length > 2 rather than just 1 as with the inherited Bruhat orders on the 
parabolic quotients of the symmetric group from classical Lie algebra theory. So the answer to what (2 'is' may lie in 
the more general framework of generalised Bruhat quotients [2J. 



B. The unique involution £ of the entropic partial order G: 

Having completed the proof of theorem[T]it remains just to make some final observations about the internal structure 
of l£ which arise when one considers whether its graph has any symmetry. Consider the 'maximal' involution in the 
Bruhat order [2J which is u> = (1, 6)(2, 5)(3, 4) € K in the usual cycle notation, and define £ w € Aut(G) to be the 
unique element of the automorphism group whose action is given by conjugation by oj. We prove here a structure 
theorem for the graph of the entropic posct 2; on the elements of K\G. 

Theorem 22. £ w is the unique automorphism of G which respects the entropic partial order € on K\G. 

In other words, £ w induces a graph automorphism of the directed graph on 60 nodes with 186 edges which is 
conjecturally the graph of covering relations of £. Moreover if we ignore the 3 covering relations contained in the 
unproven relations C4, this involution still induces an automorphism of the graph of the remaining 183 relations. See 
appendix [B] for the details of these directed graphs. 



Proof. 'Analytical': In theorems 15 an d |17| we derived from first principles the set of relations which arise only from 



the binary relation >. In proposition 12 (and see also corollary |14|) we explored those relations which arise from 
majorisation and saw that they are subsumed under the first set. This gave a directed graph on 60 nodes with 262 
edges, whose covering relations boil down to 186 edges on the 60 nodes: 2; is defined to be the transitive closure of 
these covering relations. Feeding the adjacency matrix of this graph into the program SAGE (www.sagemath.org) 
gave us a graph automorphism group {1, k} of order 2, which fixes 16 nodes and acts as an involution on the other 44, 
splitting them into 22 orbits of 2 matrix classes each. We should also mention that we confirmed the uniqueness of 
the graph automorphism result using SAUCY (http://vlsicad.eecs.umich.edu/BK/SAUCY/). 

To discover to which (if any) automorphism of the group G this graph automorphism k might correspond we 
proceeded as follows. The normalize!' Nq(K) of K in G is just K itself, and no outer automorphism of G can 
fix K: consider for example the row-swap element (1, 4)(2, 5)(3, 6) which must map under any non-trivial outer 
automorphism to a single transposition |1U chapter 7]. But there are no single transpositions in K. So the only 
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possible candidates to give by conjugation an (inner) automorphism of G which preserves the structure of K\G are 
the elements of K itself. Of these only uj respects the binary relation \> in every instance (we used the computer 
program GAP (www.gap-system.org) to check this, using orbit sizes). So in fact k = £ w as claimed. 

'Algebraic': once we know the individual relations constructed in definition [7] we are also able to verify algebraically 
that conjugation by uj swaps these relations among themselves modulo equivalence by left multiplication by elements 
of K, leaving the total structure unaltered. □ 

Finally we make a few comments on why this involution preserves the single-transpositional relations within the 
partial order, this time from a purely theoretical point of view. That £ w respects majorisation follows from propo- 
sition [13] and the observation that the action of £ w on figures [3] and [4] is to reflect them in a horizontal line passing 
through the centre of each: hence the property of lying on a path joining two nodes is unaltered by the action of £ u . 
It is also possible to show directly that £ w respects > relations separated by a single transposition, as follows. Given 
any g G G, denote by g* u the image £ w (<?) of g under the inner automorphism £ u , or in other words gp u = ujguj^ 1 . 

Proposition 23. Suppose a > ctt for some transposition r. Then 



Proof. The easiest way to approach this is to use again the criteria from theorem 15 on pairs of matrix classes. For 
any letter z in the set of six letters acted upon by G let us write z for its image under uj: so for example a — /, etc. 
Since uj G K and uj^ 1 — uj it follows that the impact of conjugation by uj upon a right coset Ka is the same as that 
of right multiplication by uj, which in matrix format means we simply replace z with z everywhere. So the image 

a x y 
u j3 v 



under £ of the matrix class represented by M — ( a „ ) will be 



a x y 

u j3 v 



M = 

Now £ u reverses all size relations and so a > (3, x > u become a < (3 and x < u. Furthermore the tr ans position 
r = (a, /?) becomes = (/3, a). Putting the matrix M back into the form in the hypotheses of theorem 15 requires 
that we choose as a representative of the same class M instead: 

M'=( I n I 
y x a y 

We need to show that M > M T implies that M' > M' T " and that M T > M implies M /T ^ t> M' . Looking again at 
theorem [15] we see that the necessary and sufficient conditions for type A and type B relations give 

AT > M <S=^ v > y <S=^> y > v <^=4> M' r ^ > M' , 

and 

M t> M T <^=^ y > x > u > v <=^ v > u > x > y <=> M > M 
This completes the proof. □ 
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IV. Appendices 



A. The matrix class representatives in R.2x3 



We list the matrix representatives in R,2 X 3 m lexicographic order together with the lexicographic enumeration we 
have used throughout the paper when referring to them, alongside in each case the element a e G = S 6 in cycle 

notation which represents the appropriate permutation of the fiducial matrix ^ ^ g j which we have chosen to 

represent the identity () G G. Note that each a is only chosen up to left multiplication by an element of K. Also, 
since we have chosen to represent the matrices with a in the top left-hand corner and with decreasing top row, the 
set of representative cycles displayed is effectively a copy of S5 modulo a subgroup of order 2. 







13 



10 



14 



11 



15 



12 



16 



17 



a b e 
f c d 



(35)(46) 18 : 



a b e 
f d c 



(3645) 



19 



a b f 

c d e 



I, (3456) 



20 



21 



a b f 

dee 



(356) 22 : 



a b f 

dec 



(36) 



23 



a b f 

e c d 



(3546) 



24 



(36) (45) 



25 



a c d 
b e f 



i , (243) 26 : 



a c d 
b f e 



(243)(56) 



27 



a c d 
e b f 



(2543) 



28 



(26543) 



29 




(25643) 30 : 




(2643) 
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b d f 



! , (2453) 



32 : 



b f d 



(24653) 



33 



d b f 



(253) 34 : 



d I b 



(2653) 



35 



, (253)(46) 




36 



/ d b 



I, (26453) 



37 




(24563) 38 



a c f 
bed 



(2463) 



39 



40 



a c f 
deb 



(263) 



41 




(25463) 42 : 



a c f 
e d b 



(263) (45) 



43 



, (24) (35) 



44 



a d e 
b f c 



(24) (365) 



45 




, (2534) 46 



a d e 
c I b 



, (26534) 



47 



48 



a d e 
I c b 



(264) (35) 



49 



53 



(24)(356) 50 
(254) (36) 54 




51 



55 



52 



56 




(2634) 
(245) (36) 



57 




, (25)(346) 58 : 



(26345) 



59 



, (25)(36) 



60 



(2635) 
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B. Matrices referred to in the text 

Here we reproduce the often rather large matrices which are referred to in the text in the course of certain calcula- 
tions, but which would make the main body of the paper too cumbersome if they appeared there. 

1. GR2, GR3 and the matrix of all majorisation relations 

First, we adopt as always the lexicographic ordering of the elements of GR2 (ie a + b, a + c, . . . , e + /) and then the 
matrix M 2 is as follows: 



M 2 = 



fo 
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Again with the lexicographic ordering of the elements of GR3 (ie a + b + c, a + b + d, . . . , d + e + /) the matrix M 3 
is as follows: 



M 3 = 



fo 


1 





















































0\ 








1 





1 
























































1 





1 






























































1 
























































1 














1 















































1 


1 











1 


















































1 











1 















































1 














1 















































1 














1 






























































1 















































1 






























































1 


1 






























































1 



























































1 





1 
























































1 





1 






























































1 
























































1 






























































1 






























































1 


\o 
























































o) 



The 60 x 60 matrix reflecting all transpositions is now easily generated using the criteria in the text, once we form 
the sums of powers of these two matrices. 
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2. Majorisation via transpositions 

The easiest way to give the structure of the 165 single-transposition majorisation relations referred to in proposi- 



tion 12 is to represent them as 165 ordered pairs using the numbering in appendix |A| From this set of pairs it is 
straightforward to rebuild the adjacency matrix of the partial order, namely we just put a '1' in each place whose 
entry coordinates are given by one of these pairs, and zeroes elsewhere. 
The 165 relations are thus: 

(1,2), (1,3), (2,4), (3,4), (2,5), (3,5), (1,6), (4,6), (5,6), (7,8), (7,9), (4,10), (8,10), (9,10), (8,11), 
(9,11), (6,12), (7,12), (10,12), (11,12), (8,14), (13,14), (13,15), (2,16), (14,16), (15,16), (11,17), (14,17), 
(15,17), (5,18), (13,18), (16,18), (17,18), (13,19), (7,20), (19,20), (15,21), (19,21), (1,22), (20,22), (21,22), 
(9,23), (20,23), (21,23), (3,24), (19,24), (22,24), (23,24), (25,26), (9,27), (25,27), (3,28), (26,28), (27,28), 
(11,29), (26,29), (27,29), (5,30), (25,30), (28,30), (29,30), (26,32), (31,32), (15,33), (31,33), (1,34), (7,34), 
(32,34), (33,34), (11,35), (17,35), (29,35), (32,35), (33,35), (6,36), (12,36), (18,36), (30,36), (31,36), (34,36), 
(35,36), (31,37), (25,38), (37,38), (15,39), (33,39), (37,39), (2,40), (8,40), (16,40), (34,40), (38,40), (39,40), 
(9,41), (27,41), (38,41), (39,41), (4,42), (10,42), (14,42), (28,42), (37,42), (40,42), (41,42), (31,43), (25,44), 
(43,44), (13,45), (43,45), (7,46), (44,46), (45,46), (5,47), (11,47), (18,47), (30,47), (35,47), (44,47), (45,47), 
(6,48), (12,48), (17,48), (29,48), (36,48), (43,48), (46,48), (47,48), (31,49), (26,50), (32,50), (49,50), (13,51), 
(49,51), (8,52), (14,52), (50,52), (51,52), (3,53), (9,53), (28,53), (34,53), (50,53), (51,53), (4,54), (10,54), 
(16,54), (27,54), (49,54), (52,54), (53,54), (25,55), (26,56), (55,56), (7,57), (55,57), (8,58), (56,58), (57,58), 
(1,59), (56,59), (57,59), (2,60), (55,60), (58,60), (59,60). 

It is easy to verify that the following 30 pairs are not covering relations even just within the pure majorisation 
framework (eg observe that (1,6) factorises into the product (1,2) — > (2,4) — > (4,6)): 

(1,6), (7,12), (13,18), (19,24), (25,30), (11,35), (6,36), (31,36), (15,39), (2,40), (8,40), (9,41), (4,42), 
(14,42), (37,42), (5,47), (11,47), (6,48), (12,48), (17,48), (29,48), (43,48), (26,50), (8,52), (3,53), (9,53), 
(4,54), (27,54), (49,54), (55,60). 

The following 20 additionally will disappear (ie they factorise as a sequence of other relations) once all of the 
entropic relations are introduced: 

(5,18), (1,22), (3,24), (5,30), (1,34), (7,34), (32,35), (33,35), (12,36), (16,40), (38,40), (10,42), (44,47), 
(45,47), (46,48), (28,53), (51,53), (10,54), (1,59), (2,60). 

As mentioned in the text, the two 'exceptional' relations referred to in corollary [14] do not give rise to any covering 
relations; hence we are left with just 115 covering relations in the entropic partial order (£ which arise solely from 
majorisation. 



3. Entropic partial order 

As in the previous section we shall use ordered pairs ('sparse matrix representation') to give the set of all 90 
non-majorisation entropic relations arising from single transpositions as per theorem |15| viz.: 

(31,10), (43,10), (5,11), (37,12), (49,12), (26,14), (32,14), (25,16), (44,16), (6,17), (50,17), (56,17), 
(12,18), (38,18), (55,18), (31,19), (37,19), (25,20), (38,20), (43,21), (49,21), (16,22), (26,22), (50,22), 
(4,23), (44,23), (55,23), (10,24), (32,24), (56,24), (15,27), (33,27), (13,28), (45,28), (6,29), (21,29), 
(39,29), (12,30), (19,30), (51,30), (46,34), (5,35), (23,35), (41,35), (52,35), (58,35), (20,36), (57,36), 
(45,39), (51,39), (22,40), (52,40), (3,41), (46,41), (57,41), (24,42), (58,42), (1,46), (24,47), (40,47), 
(53,47), (60,47), (22,48), (42,48), (54,48), (59,48), (38,50), (44,50), (2,52), (20,52), (46,52), (41,53), 
(59,53), (23,54), (60,54), (31,55), (49,55), (37,56), (43,56), (13,57), (51,57), 
(4,58), (19,58), (45,58), (15,59), (28,59), (39,59), (10,60), (21,60), (33,60). 

In the proof of theorem [15] we split these 90 single-transposition entropic relations into two subsets: type A and 
type B. There are only 15 type B relations (the remainder above are type A), one corresponding to each of the 15 
transpositions (a, 6), (a,c) etc. We list them here for reference: 

(a, b) : 28 > 59; (a, c) : 10 > 60; (a, d) : 4 > 58; (a, e) : 2 > 52; (a, /) : 1 > 46; (b, c) : 12 > 30; (b, d) : 6 > 29; 
(6,e) : 5 > 35; (b,f) : 3 > 41; (c,d) : 5 > 11; (c, e) : 6 > 17; (c, /) : 4 > 23; (d, e) : 12 > 18; (d, f) : 10 > 24; 
(e,/) : 16>22. 

Returning to the total set of entropic relations above, we need now to add in the 5 'sporadic' multiple-transposition 



relations (3 of which are conjectural) from theorem 17 
(15,10), (26,10), (37,11), (43,11), (49,11). 

The transitive reduction of this total set of 95 relations is just the set again - that is to say, all 95 relations are 
covering relations just within the context of 'purely entropic' relations. However 24 of them will factorise once we 
introduce the majorisation relations, as follows: 
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(31,10), (37,12), (49,12), (26,14), (25,16), (38,18), (55,18), (31,19), 
(25,20), (26,22), (32,24), (15,27), (13,28), (19,30), (51,30), (5,35), 
(20,36), (57,36), (22,48), (59,48), (31,55), (13,57), (15,59), (33,60). 

Notice that all but one of these (namely (5, 35) ) are type A. So we are left with 71 covering 'purely entropic' 
relations (ie which are not ascribable to majorisation), which together with the 115 majorisation relations in the 
previous section, gives us our complete set of 186 covering relations for the entropic partial order (£. We give this 
complete set in the next section. 

Finally, we list the 30 pairs of matrices mentioned after proposition [20] where neither row sum majorisation nor 
column sum majorisation obtain in either direction (so in particular no entropic relation would even be possible): 
{14,28}, {14,59}, {16,27}, {16,29}, {17,28}, {17,59}, {17,60}, {22,27}, {22,29}, {22,35}, 
{22,41}, {22,52}, {22,58}, {23,34}, {23,40}, {23,59}, {23,60}, {24,29}, {24,35}, {24,52}, 
{24,58}, {34,58}, {35,59}, {35,60}, {41,59}, {41,60}, {42,47}, {47,54}, {52,59}, {53,58}. 



4. The entropic partial ordering (£ on K\G = R.2x3 
Here is the final set of 186 covering relations: 

(1,2), (1,3), (2,4), (3,4), (2,5), (3,5), (4,6), (5,6), (7,8), (7,9), (4,10), (8,10), 
(9,10), (15,10), (26,10), (43,10), (5,11), (8,11), (9,11), (37,11), (43,11), (49,11), (6,12), (10,12), 
(11,12), (8,14), (13,14), (32,14), (13,15), (2,16), (14,16), (15,16), (44,16), (6,17), (11,17), (14,17), 
(15,17), (50,17), (56,17), (12,18), (16,18), (17,18), (13,19), (37,19), (7,20), (19,20), (38,20), (15,21), 
(19,21), (43,21), (49,21), (16,22), (20,22), (21,22), (50,22), (4,23), (9,23), (20,23), (21,23), (44,23), 
(55,23), (10,24), (22,24), (23,24), (56,24), (25,26), (9,27), (25,27), (33,27), (3,28), (26,28), (27,28), 
(45,28), (6,29), (11,29), (21,29), (26,29), (27,29), (39,29), (12,30), (28,30), (29,30), (26,32), (31,32), 
(15,33), (31,33), (32,34), (33,34), (46,34), (17,35), (23,35), (29,35), (41,35), (52,35), (58,35), (18,36), 
(30,36), (34,36), (35,36), (31,37), (25,38), (37,38), (33,39), (37,39), (45,39), (51,39), (22,40), (34,40), 
(39,40), (52,40), (3,41), (27,41), (38,41), (39,41), (46,41), (57,41), (24,42), (28,42), (40,42), (41,42), 
(58,42), (31,43), (25,44), (43,44), (13,45), (43,45), (1,46), (7,46), (44,46), (45,46), (18,47), (24,47), 
(30,47), (35,47), (40,47), (53,47), (60,47), (36,48), (42,48), (47,48), (54,48), (31,49), (32,50), (38,50), 
(44,50), (49,50), (13,51), (49,51), (2,52), (14,52), (20,52), (46,52), (50,52), (51,52), (34,53), (41,53), 
(50,53), (59,53), (16,54), (23,54), (52,54), (53,54), (60,54), (25,55), (49,55), (26,56), (37,56), (43,56), 
(55,56), (7,57), (51,57), (55,57), (4,58), (8,58), (19,58), (45,58), (56,58), (57,58), (28,59), (39,59), 
(56,59), (57,59), (10,60), (21,60), (58,60), (59,60). 

Set out as an adjacency matrix it represents a directed graph on 60 nodes with 186 edges, and as mentioned in 
theorem [22] this graph has a unique automorphism of order 2 which we called where ui is the unique involution 
(1, 6)(2, 5)(3, 4) of maximal length in the subgroup K of G = Sq and £ w is the inner automorphism of G which is given 
by conjugation by ui within G. The orbits of £ w consist of 22 pairs of nodes which are swapped by together with 
the remaining 16 nodes which are fixed by its action. We now give the orbits, using the matrix-numbering notation 
above: 

{2,3}, {8,9}, {13,25}, {14,27}, {15,26}, {16,28}, {17,29}, {18,30}, {19,55}, {20,57}, {21,56}, {22,59}, 
{23,58}, {24,60}, {32,33}, {37,49}, {38,51}, {39,50}, {40,53}, {41,52}, {42,54}, {44,45}, 
{1}, {4}, {5}, {6}, {7}, {10}, {11}, {12}, {31}, {34}, {35}, {36}, {43}, {46}, {47}, {48}. 

Another way to say this is that conjugation by u) amounts to an involution Xuj in the symmetric group on K\G: 
that is to say % u G Sgo and in cycle notation it has the form: 

X u = (1) (2, 3) (4) (5) (6) (7) (8, 9) (10) (11) (12) (13, 25) (14, 27) (15, 26) (16, 28) ■ 

•(17, 29)(18, 30)(19, 55)(20, 57)(21, 56)(22, 59)(23, 58)(24, 60)(31)(32, 33) • 
•(34) (35) (36) (37, 49)(38, 51)(39, 50)(40, 53)(41, 52)(42, 54)(43)(44, 45) (46) (47) (48). 

See also figure [2] the blue double-headed dashed lines represent the action of \u on the matrix classes of R.2x3- 
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